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Abstract 


This  dissertation  presents  a  framework  for  verifying  concurrent 
message-passing  C  programs  in  an  automated  manner.  The  methodology 
relies  on  several  key  ideas.  First,  programs  are  modeled  as  finite 
state  machines  whose  states  are  labeled  with  data  and  whose  transitions 
are  labeled  with  events.  We  refer  to  such  state  machines  as  labeled 
Kripke  structures  (LKSs).  Our  state/event-based  approach  enables 
us  to  succinctly  express  and  efficiently  verify  properties  which  involve 
simultaneously  both  the  static  (data-based)  and  the  dynamic  (reactive 
or  event-based)  aspects  of  any  software  system.  Second,  the  framework 
supports  a  wide  range  of  specification  mechanisms  and  notions  of 
conformance.  For  instance,  complete  system  specifications  can  be 
expressed  as  LKSs  and  simulation  conformance  verified  between  such 
specifications  and  any  C  implementation.  For  partial  specifications,  the 
framework  supports  (in  addition  to  LKSs)  a  state/event-based  linear 
temporal  logic  capable  of  expressing  complex  safety  as  well  as  liveness 
properties.  Finally,  the  framework  enables  us  to  check  for  deadlocks 
in  concurrent  message-passing  programs.  Third,  for  each  notion  of 
conformance,  we  present  a  completely  automated  and  compositional 
verification  procedure  based  on  the  counterexample  guided  abstraction 
refinement  (CEGAR)  paradigm.  Like  other  CEGAR-based  approaches, 
these  verification  procedures  consist  of  an  iterative  application  of  model 
construction,  model  checking,  counterexample  validation  and  model 
refinement  steps.  However,  they  are  uniquely  distinguished  by  their 
compositionality.  More  precisely,  in  each  of  our  conformance  checking 
procedures,  the  algorithms  for  model  construction,  counterexample 
validation  and  model  refinement  are  applied  component-wise.  The  state- 
space  size  of  the  models  are  controlled  via  a  two-pronged  strategy:  (i) 
using  two  complementary  abstraction  techniques  based  on  the  static 
(predicate  abstraction)  and  dynamic  (action-guided  abstraction)  aspects 
of  the  program,  and  (ii)  minimizing  the  number  of  predicates  required  for 
predicate  abstraction.  The  proposed  framework  has  been  implemented 
in  the  MAGIC  tool.  We  present  experimental  evaluation  in  support  of 
the  effectiveness  of  our  framework  in  verifying  non-trivial  concurrent  C 
programs  against  a  rich  class  of  specifications  in  an  automated  manner. 
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Glossary  of  Terms 


Note:  Several  symbols  are  used  in  multiple  contexts  and  their  meaning  depends  on 
both  the  symbol  and  the  type  of  the  subscript.  For  instance,  the  symbol  S  is  used  to 
denote  the  set  of  states  of  labeled  Kripke  structures,  Biichi  automata,  as  well  as  Kripke 
structures.  Therefore,  Sm  denotes  the  set  of  states  of  a  labeled  Kripke  structure  or 
Kripke  structure  M,  while  Sb  denotes  the  set  of  states  of  a  Biichi  automaton  B. 
Similarly,  APM  denotes  the  set  of  atomic  propositions  of  a  labeled  Kripke  structure 
M,  while  AP^  denotes  the  set  of  atomic  propositions  specified  by  a  context  7.  Some 
other  symbols  with  multiple  connotations  are  I  nit,  E,  L,  T  (which  are  used  for  both 
labeled  Kripke  structures  and  Biichi  automata)  and  J  (which  is  used  for  both  labeled 
Kripke  structures  and  traces). 
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Chapter  1 


Introduction 


The  ability  to  reason  about  the  correctness  of  programs  is  no  longer  a  subject  of 
primarily  academic  interest.  With  each  passing  day  the  complexity  of  software 
artifacts  being  produced  and  employed  is  increasing  dramatically.  There  is  hardly 
any  aspect  of  our  day-to-day  lives  where  software  agents  do  not  play  an  often  silent 
yet  crucial  role.  The  fact  that  many  of  such  roles  are  safety-critical  mandates  that 
these  software  artifacts  be  validated  rigorously  before  deployment.  So  far,  however, 
this  goal  has  largely  eluded  us. 

In  this  chapter  we  will  first  layout  the  problem  space  which  is  of  concern  to  this 
thesis,  viz.,  automated  formal  verification  of  concurrent  programs.  We  will  present 
the  core  issues  and  problems,  as  well  as  the  major  paradigms  and  techniques  that  have 
emerged  in  our  search  for  effective  solutions.  We  will  highlight  the  important  hurdles 
that  remain  to  be  scaled.  The  later  portion  of  this  chapter  presents  an  overview  of 
the  major  techniques  proposed  by  this  thesis  to  surmount  these  hurdles.  The  chapter 
ends  with  a  summary  of  the  core  contributions  of  this  dissertation. 
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1.1  Software  Complexity 


Several  factors  hinder  our  ability  to  reason  about  non-trivial  concurrent  programs  in 
an  automated  manner.  First,  the  sheer  complexity  of  software.  Binaries  obtained 
from  hundreds  of  thousands  of  lines  of  source  code  are  routinely  executed.  The 
source  code  is  written  in  languages  ranging  from  C/C++/Java  to  ML/Ocaml. 
These  languages  differ  not  only  in  their  flavor  (imperative,  functional)  but  also  in 
their  constructs  (procedures,  objects,  pattern-matching,  dynamic  memory  allocation, 
garbage  collection),  semantics  (loose,  rigorous)  and  so  on. 

This  sequential  complexity  is  but  one  face  of  the  coin.  Matters  are  further 
exacerbated  by  what  can  be  called  parallel  complexity.  State  of  the  art  software 
agents  rarely  operate  in  isolation.  Usually  they  communicate  and  cooperate  with 
other  agents  while  performing  their  tasks.  With  the  advent  of  the  Internet,  and 
the  advance  in  networking  technology,  the  scope  of  such  communication  could  range 
from  multiple  threads  communicating  via  shared  memory  on  the  same  computer  to 
servers  and  clients  communicating  via  SSL  channels  across  the  Atlantic.  Verifying 
the  correctness  of  such  complex  behavior  is  a  daunting  challenge. 


1.2  Software  Development 

Another,  much  less  visible  yet  important,  factor  is  the  development  process  employed 
in  the  production  of  most  software  and  the  role  played  by  validation  and  testing 
methodologies  in  such  processes.  A  typical  instance  of  a  software  development  cycle 
consists  of  five  phases  -  (i)  requirement  specification,  (ii)  design,  (iii)  design  validation, 
(iv)  implementation  and  (v)  implementation  validation.  The  idea  is  that  defects  found 
in  the  design  (in  phase  iii)  are  used  to  improve  the  design  and  those  found  in  the 
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implementation  (in  phase  v)  are  used  to  improve  the  implementation.  The  cycle  is 
repeated  until  each  stage  concludes  successfully. 

Usually  the  design  is  described  using  a  formal  notation  like  UML.  The  dynamic 
behavior  is  often  described  using  Statecharts  (or  some  variant  of  it).  The  design 
validation  is  done  by  some  exhaustive  technique  (like  model  checking).  However, 
what  matters  in  the  final  tally  is  not  so  much  the  correctness  of  the  design  but 
rather  the  correctness  of  the  implementation.  Nevertheless,  in  reality,  verification  of 
the  implementation  is  done  much  less  rigorously.  This  makes  it  imperative  that  we 
focus  more  on  developing  techniques  that  enable  us  to  verify  actual  code  that  will  be 
compiled  and  executed.  A  major  fraction  of  such  code  has  been  written,  continues  to 
be  written  and,  in  my  opinion,  will  continue  to  be  written  in  C. 

Present  day  code  validation  falls  in  two  broad  categories  -  testing  and  formal 
verification.  The  merits  and  demerits  of  testing  [88]  are  well-known  and  thus  it  is 
unnecessary  to  dwell  on  them  in  detail  here.  It  suffices  to  mention  that  the  necessity 
of  being  certain  about  the  correctness  of  a  piece  of  code  precludes  exclusive  reliance  on 
testing  as  the  validation  methodology,  and  forces  us  to  adopt  more  formal  approaches. 


1.3  Software  Verification 

State  of  the  art  formal  software  verification  is  an  extremely  amorphous  entity. 
Originally,  most  approaches  in  this  field  could  be  categorized  as  belonging  to  either 
of  two  schools  of  thought:  theorem  proving  and  model  checking.  In  theorem  proving 
(or  deductive  verification  [70]),  one  typically  attempts  to  construct  a  formula  (in 
some  suitable  logic  like  higher-order  predicate  calculus)  that  represents  both  the 
system  to  be  verified  and  the  correctness  property  to  be  established.  The  validity 
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of  cj)  is  then  established  using  a  theorem  prover.  As  can  be  imagined,  deductive 
verification  is  extremely  powerful  and  can  be  used  to  verify  virtually  any  system 
(including  infinite  state  systems)  and  property.  The  flip-side  is  that  it  involves  a  lot 
of  manual  effort.  Furthermore  it  yields  practically  no  diagnostic  feedback  that  can  be 
used  for  debugging  if  <p  is  found  to  be  invalid. 


1.4  Model  Checking 

Where  theorem  proving  fails,  model  checking  [39]  shines.  In  this  approach,  the 
system  to  be  verified  is  represented  by  a  finite  state  transition  system  A4  (often 
a  Kripke  structure)  and  the  property  to  be  established  is  expressed  as  a  temporal 
logic  [81]  (usually  CTL  [32]  with  fairness  or  LTL  [78])  formula  fi.  The  model  checking 
problem  is  then  to  decide  whether  M.  is  a  model  of  q h  Not  only  can  this  process 
be  automated  to  a  large  degree,  it  also  yields  extremely  useful  diagnostic  feedback 
(often  in  the  form  of  counterexamples)  if  M  is  found  not  to  model  fi.  Owing  to  these 
and  other  factors,  the  last  couple  of  decades  have  witnessed  the  emergence  of  model 
checking  as  the  eminent  formal  verification  technique.  Various  kinds  of  temporal  logics 
have  been  extensively  studied  [59]  and  efficient  model  checking  algorithms  have  been 
designed  [35,99].  The  development  of  techniques  like  symbolic  model  checking  [21], 
bounded  model  checking  [11, 12],  compositional  reasoning  [34]  and  abstraction  [36]  have 
enabled  us  to  verify  systems  with  enormous  state  spaces  [22], 

One  of  the  original  motivations  behind  the  development  of  model  checking  was  to 
extract  and  verify  synchronization  skeletons  of  concurrent  programs,  a  typical  software 
verification  challenge.  Somewhat  ironically,  the  meteoric  rise  of  model  checking 
to  fame  was  largely  propelled  by  its  tremendous  impact  on  the  held  of  hardware 
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verification.  I  believe  that  a  major  factor  behind  this  phenomenon  is  that  model 
checking  can  only  be  used  if  a  finite  model  of  the  system  is  available.  Also  since  real 
system  descriptions  are  often  quite  large,  the  models  must  be  extracted  automatically 
or  at  least  semi-automatically.  While  this  process  is  often  straightforward  for 
hardware,  it  is  much  more  involved  for  software.  Typically  software  systems  have 
infinite  state  spaces.  Thus,  extracting  a  finite  model  often  involves  a  process  of 
abstraction  as  well. 


1.5  Predicate  Abstraction 

For  a  long  time,  the  applicability  of  model  checking  to  software  was  somewhat 
handicapped  by  the  absence  of  powerful  automated  model  extraction  techniques.  This 
scenario  changed  with  the  advent  of  predicate  abstraction  [63]  (a  related  notion  called 
data  type  abstraction  used  by  systems  like  Bandera  [8,58]  can  be  viewed  as  a  special 
instance  of  predicate  abstraction).  Even  though  predicate  abstraction  was  quickly 
picked  up  for  research  in  hardware  verification  as  well  [49,50],  its  effect  on  code 
verification  was  rather  dramatic.  It  forms  the  backbone  of  two  of  the  major  code 
verifiers  in  existence,  SLAM  [6, 107]  and  BLAST  [13,66]. 

Predicates  abstraction  is  parameterized  by  a  set  of  predicates  involving  the 
variables  of  the  concrete  system  description.  It  also  involves  non-trivial  use  of  theorem 
provers  (in  fact  the  its  original  use  [63]  was  to  create  abstract  state  transition  graphs 
using  the  theorem  prover  PVS).  Thus  it  has  triggered  a  more  subtle  effect  -  it  has 
caused  the  boundary  between  model  checking  and  theorem  proving  to  become  less 
distinct. 

Challenge  1  Predicate  abstraction  essentially  works  by  aggregating  system  states 
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that  are  similar  in  terms  of  their  data  valuations.  It  is  insensitive  to  the  events  that  a 
system  can  perform  from  a  given  state.  Can  we  develop  other  notions  of  abstraction 
that  leverage  the  similarities  between  system  states  in  terms  of  their  dynamic  ( event- 
based)  behavior?  Such  abstractions  would  complement  predicate  abstraction  and  lead 
to  further  reduction  of  state-space  size. 

1.6  Abstraction  Refinement 

Even  with  progress  in  automated  model  extraction  techniques,  verifying  large  software 
systems  remains  an  extremely  tedious  task.  A  major  obstacle  is  created  by  the 
abstraction  that  happens  during  model  extraction.  Abstraction  usually  introduces 
additional  behavior  that  is  absent  in  the  concrete  system.  Suppose  that  the  model 
check  fails  and  the  model  checker  returns  a  counterexample  CE.  This  does  not 
automatically  indicate  a  bug  in  the  system  because  it  is  entirely  possible  that  CE 
is  an  additional  behavior  introduced  by  abstraction  (such  a,  CE  is  often  called  a 
spurious  counterexample).  Thus  we  need  to  verify  whether  CE  is  spurious,  and  if 
so  we  need  to  refine  our  model  so  that  it  no  longer  allows  CE  as  an  admissible 
behavior.  This  process  is  called  abstraction  refinement.  Since  the  extracted  models 
and  counterexamples  generated  are  quite  large,  abstraction  refinement  must  be 
automated  (or  at  least  semi-automated)  to  be  practically  effective. 

The  above  requirements  lead  naturally  to  the  paradigm  called  counterexample 
guided  abstraction  refinement  (CEGAR).  In  this  approach,  the  entire  verification 
process  is  captured  by  a  three  step  abstract-verify-refine  loop.  The  actual  details  of 
each  step  depend  on  the  kind  of  abstraction  and  refinement  methods  being  used.  The 
steps  are  described  below  in  the  context  of  predicate  abstraction,  where  Pred  denotes 
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the  set  of  predicates  being  used  for  the  abstraction. 


1.  Step  1  :  Model  Creation.  Extract  a  finite  model  from  the  code  using 
predicate  abstraction  with  Pred.  and  go  to  step  2. 

2.  Step  2  :  Verification.  Check  whether  the  model  satisfies  the  desired  property. 
If  this  is  the  case,  the  verification  successfully  terminates;  otherwise,  extract  a 
counterexample  CE  and  go  to  step  3. 

3.  Step  3  :  Refinement.  Check  if  CE  is  spurious.  If  not  we  have  an  actual  bug 
and  the  verification  terminates  unsuccessfully.  Otherwise  we  improve  Pred  and 
go  to  step  1.  Let  us  refer  to  the  improved  Pred  as  Pred.  Then  Pred  should  be 
such  that  CE  and  all  previous  spurious  counterexamples  will  be  eliminated  if 
the  model  is  extracted  using  Pred. 

Challenge  2  Software  model  checking  has  focused  almost  exclusively  on  the 
verification  of  safety  properties  via  some  form  of  trace  containment.  It  would  be 
desirable  to  extend  its  applicability  to  more  general  notions  of  conformance  such  as 
simulation  and  richer  class  of  specifications  such  as  liveness. 

Challenge  3  The  complexity  of  predicate  abstraction  is  exponential  in  the  number 
of  predicates  used.  The  naive  abstraction  refinement  approach  keeps  on  adding  new 
predicates  on  the  basis  of  spurious  counterexamples.  Previously  added  predicates 
are  not  removed  even  if  they  have  been  rendered  redundant  by  predicated  discovered 
subsequently.  Can  we  improve  this  situation? 
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1.7  Compositional  Reasoning 


CEGAR  coupled  with  predicate  abstraction  has  become  an  extremely  popular 
approach  toward  the  automated  verification  of  sequential  software,  especially  C 
programs  [13]  such  as  device  drivers  [107].  However,  considerably  less  research  has 
been  devoted  to-wards  the  application  of  these  techniques  for  verifying  concurrent 
programs. 

Compositional  reasoning  has  long  been  recognized  as  one  of  the  most  potent 
solutions  to  the  state-space  explosion  which  plagues  the  analysis  of  concurrent 
systems.  Compositionality  appears  explicitly  in  the  theory  of  process  algebras  such 
as  CSP  [69],  CCS  [85]  and  the  7r-Calculus  [86].  A  wide  variety  of  process  algebraic 
formalisms  have  been  developed  with  the  intention  of  modeling  concurrent  systems 
and  it  is  therefore  natural  [9]  to  investigate  whether  process  algebraic  concepts  are 
useful  in  the  verification  domain  as  well. 

One  of  the  key  concepts  arising  out  of  the  process  algebraic  research  is  the  need  to 
focus  on  communication  [85]  when  reasoning  about  concurrent  systems.  For  instance 
CSP  advocates  the  use  of  shared  actions  as  the  principal  communication  mechanism 
between  concurrent  components  of  a  system.  Moreover,  shared  action  communication 
can  model  message-passing  C  programs  such  as  client-server  systems  and  web-services 
in  a  very  natural  manner. 

Challenge  4  The  CEGAR  paradigm  has  been  used  with  considerable  success  on 
sequential  programs.  Can  we  also  use  it  to  com, positionally  verify  concurrent 
programs?  What,  if  any,  are  the  restrictions  that  we  might  need  to  impose  in  order 
to  achieve  this  goal? 


1.8  State/event  based  Analysis 


A  major  difficulty  in  applying  model  checking  for  practical  software  verification 
lies  in  the  modeling  and  specification  of  meaningful  properties.  The  most  common 
instantiations  of  model  checking  to  date  have  focused  on  finite-state  models  and  either 
branching-time  (CTL  [32])  or  linear-time  (LTL  [78])  temporal  logics.  To  apply  model 
checking  to  software,  it  is  necessary  to  specify  (often  complex)  properties  on  the 
finite-state  abstracted  models  of  computer  programs.  The  difficulties  in  doing  so  are 
even  more  pronounced  when  reasoning  about  modular  software,  such  as  concurrent  or 
component-based  sequential  programs.  Indeed,  in  modular  programs,  communication 
among  modules  proceeds  via  actions  (or  events),  which  can  represent  function  calls, 
requests  and  acknowledgments,  etc.  Moreover,  such  communication  is  commonly 
data-dependent.  Software  behavioral  claims,  therefore,  are  often  specifications  defined 
over  combinations  of  program  actions  and  data  valuations. 

Existing  modeling  techniques  usually  represent  finite-state  machines  as  finite 
annotated  directed  graphs,  using  either  state-based  or  event-based  formalisms.  It 
is  well-known  that  the  two  frameworks  are  interchangeable.  For  instance,  an 
action  can  be  encoded  as  a  change  in  state  variables,  and  likewise  one  can  equip 
a  state  with  different  actions  to  reflect  different  values  of  its  internal  variables. 
However,  converting  from  one  representation  to  the  other  often  leads  to  a  significant 
enlargement  of  the  state  space.  Moreover,  neither  approach  on  its  own  is  practical 
when  it  comes  to  modular  software,  in  which  actions  are  often  data-dependent: 
considerable  domain  expertise  is  then  required  to  annotate  the  program  and  to  specify 
proper  claims. 

Challenge  5  Can  we  develop  a  formalism  for  succinctly  expressing  and  efficiently 
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verifying  state/ event-based  properties  of  programs?  In  particular  we  should  be  able 
to  verify  a  state/event  system  directly  without  having  to  translate  it  to  an  equivalent 
pure-state  or  pure- event  version.  Further,  can  we  combine  state/event-based  analysis 
with  a  compositional  CEGAR  scheme? 

1.9  Deadlock  Detection 

Ensuring  that  standard  software  components  are  assembled  in  a  way  that  guarantees 
the  delivery  of  reliable  services  is  an  important  task  for  system  designers.  Certifying 
the  absence  of  deadlock  in  a  composite  system  is  an  example  of  a  stringent  requirement 
that  has  to  be  satisfied  before  the  system  can  be  deployed  in  real  life.  This  is  especially 
true  for  safety-critical  systems,  such  as  embedded  systems  or  plant  controllers,  that 
are  expected  to  always  service  requests  within  a  fixed  time  limit  or  be  responsive  to 
external  stimuli. 

In  addition,  many  formal  analysis  techniques,  such  as  temporal  logic  model 
checking  [32,  39],  assume  that  the  systems  being  analyzed  are  deadlock-free.  In  order 
for  the  results  of  such  analysis  to  be  valid,  one  usually  needs  to  establish  deadlock 
freedom  separately.  Last  but  not  least,  in  case  a  deadlock  is  detected,  it  is  highly 
desirable  to  be  able  to  provide  system  designers  and  implemented  with  appropriate 
diagnostic  feedback. 

However,  despite  significant  efforts,  validating  the  absence  of  deadlock  in  systems 
of  realistic  complexity  remains  a  major  challenge.  The  problem  is  especially  acute  in 
the  context  of  concurrent  programs  that  communicate  via  mechanisms  with  blocking 
semantics,  e.g.,  synchronous  message-passing  and  semaphores.  The  primary  obstacle 
is  the  well-known  state  space  explosion  problem  whereby  the  size  of  the  state  space 
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of  a  concurrent  system  increases  exponentially  with  the  number  of  components. 

As  mentioned  before,  two  paradigms  are  usually  recognized  as  being  the  most 
effective  against  the  state  space  explosion  problem:  abstraction  and  compositional 
reasoning.  Even  though  these  two  approaches  have  been  widely  studied  in  the 
context  of  formal  verification  [36,64,67,84],  they  find  much  less  use  in  deadlock 
detection.  This  is  possibly  a  consequence  of  the  fact  that  deadlock  is  inherently  non- 
compositional  and  its  absence  is  not  preserved  by  standard  abstractions.  Intuitively, 
the  fundamental  problem  here  is  that  deadlock  is  an  existential  safety  property. 
Therefore,  a  compositional  CEGAR  scheme  for  deadlock  detection  would  be  especially 
significant. 

Challenge  6  In  the  light  of  the  above  discussion,  can  we  develop  a  compositional 
CEGAR-based  procedure  for  deadlock  detection? 

1.10  Summary 

This  dissertation  presents  a  framework  for  verifying  concurrent  message-passing  C 
programs  with  specific  emphasis  on  addressing  the  challenges  enumerated  earlier  in 
this  chapter.  Among  other  things,  we  addresses  Challenge  5  by  enabling  both  state- 
based  and  action-based  properties  to  be  expressed,  combined,  and  efficiently  verified. 
To  this  end  we  propose  the  use  of  labeled  Kripke  structures  (LKSs)  as  the  modeling 
formalism.  In  essence,  an  LKS  is  a  finite  state  machines  in  which  states  are  labeled 
with  atomic  propositions  and  transitions  are  labeled  with  events  (or  actions).  In  the 
rest  of  this  chapter  we  will  refer  to  a  concurrent  message-passing  C  program  as  simply 
a  program. 

Our  state/event-based  modeling  methodology  is  described  in  two  stages.  We  first 
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present  a  semantics  of  programs  in  terms  of  LKSs  (cf.  Chapter  3).  We  then  develop  a 
generalized  form  of  predicate  abstraction  to  construct  conservative  LKS  abstractions 
from  programs  (cf.  Chapter  4)  in  an  automated  manner.  We  provide  formal 
justification  for  our  claim  that  the  extracted  LKS  models  are  indeed  conservative 
abstractions  of  the  concrete  programs  from  which  they  have  been  constructed. 

Subsequently  we  address  Challenge  2  and  Challenge  4  by  presenting  a 
compositional  CEGAR  procedure  for  verifying  simulation  conformance  between  a 
program  and  an  LKS  specification.  We  define  the  notion  of  witness  LKSs  as 
counterexamples  to  simulation  conformance  and  present  algorithms  for  efficiently 
constructing  such  counterexamples  upon  the  failure  of  a  simulation  check  (cf. 
Chapter  5).  We  next  present  algorithms  for  checking  the  validity  of  witness  LKSs  and 
refining  the  LKS  models  if  the  witness  is  found  to  be  spurious  (cf.  Chapter  6).  The 
entire  CEGAR  procedure  is  compositional  in  the  sense  that  the  model  construction, 
witness  validation  and  abstraction  refinement  are  performed  component- wise.  Note 
that  we  do  not  delve  into  the  compositional  nature  of  the  model  checking  step. 
Compositional  model  checking  has  been  the  focus  of  considerable  research  and  we 
hope  to  leverage  the  significant  breakthroughs  that  have  emerged  from  this  effort. 

Moving  on,  we  propose  the  use  of  predicate  minimization  as  a  solution  to 
Challenge  3  (cf.  Chapter  7).  Our  approach  uses  pseudo-Boolean  constraints  to 
minimize  the  number  of  predicates  used  for  predicate  abstraction  and  thus  eliminates 
redundant  predicates  as  new  ones  are  discovered.  We  also  present  an  action- 
guided  abstraction  refinement  scheme  to  address  Challenge  1  (cf.  Chapter  9).  This 
abstraction  works  by  aggregating  states  based  on  the  events  they  can  perform  and 
complements  predicate  abstraction  naturally.  Both  these  solutions  are  seamlessly 
integrated  with  the  compositional  CEGAR  scheme  presented  earlier. 
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In  Chapter  8  we  present  the  logic  SE-LTL,  a  state/event  derivative  of  the 
standard  linear  temporal  logic  LTL.  We  present  efficient  SE-LTL  model  checking 
algorithms  to  help  reason  about  state/event-based  systems.  We  also  present 
a  compositional  CEGAR  procedure  [23,26,37]  for  the  automated  verification  of 
concurrent  C  programs  against  SE-LTL  specifications.  SE-LTL  enriches  our 
specification  mechanism  by  allowing  state/event-based  liveness  properties  and  is  thus 
relevant  to  both  Challenge  2  and  Challenge  5. 

Finally,  in  Chapter  10  we  address  Challenge  6  by  presenting  a  compositional 
CEGAR  scheme  to  perform  deadlock  detection  on  concurrent  message-passing 
programs  [27].  In  summary,  the  demand  for  better  formal  techniques  to  verify 
concurrent  and  distributed  C  programs  is  currently  overwhelming.  This  dissertation 
identifies  some  notable  stumbling  blocks  in  this  endeavor  and  provides  a  road  map  to 
their  solution. 
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Chapter  2 


Preliminaries 


In  this  chapter  we  present  preliminary  notations  and  definitions  that  will  be  used  in 
the  rest  of  the  thesis.  We  assume  a  denumerable  set  of  atomic  propositions  AP. 

Definition  1  (Labeled  Kripke  Structure)  A  Labeled  Kripke  Structure  (LKS)  is 
a  6-tuple  (S,  I  nit ,  AP,  L ,  E,  T)  where:  (i)  S  is  a  non-empty  set  of  states,  (ii)  Init  C  S 
is  a  set  of  initial  states,  (in)  AP  C  AP  is  a  finite  set  of  atomic  propositions,  (iv) 
L  :  S  — >  2ap  is  a  propositional  labeling  function  that  maps  every  state  to  a  set  of 
atomic  propositions  that  are  true  in  that  state,  (v)  E  is  a  set  of  actions,  also  known 
as  the  alphabet,  and  (vi)  TCSx'ExS  is  a  transition  relation. 

Important  note:  In  the  rest  of  this  thesis  we  will  write  Field  Tup  to  mean 
the  field  Field  of  a  tuple  Tup.  Thus,  for  any  LKS  M  =  (S,  Init,  AP,  L,  E,  T),  we 
will  write  Sm,  InitM,  APM,  LM,  E M  and  TM  to  mean  S,  Init,  AP,  L,  E  and  T 
respectively.  Also  we  will  write  s  ~^m  s'  to  mean  (s,a,s')  £  TM.  When  M  is  clear 
from  the  context  we  will  write  simply  s  -—>■  s'. 

Example  1  Figure  2.1  shows  a  simple  LKS  M  =  (S,  Init,  AP,  L,E,  T)  with  five 
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states  {1,2,  3, 4,  5}.  The  alphabet  E  is  {a,  (3,  x,  5}  and  the  set  of  atomic  propositions 
is  AP  =  {p,q,r}.  Transitions  are  shown  as  arrows  labeled  with  actions.  The  initial 
state  1  is  indicated  by  an  incoming  transition  with  no  source  state.  The  propositional 
labellings  are  shown  beside  the  respective  states. 


M 


(P,r} 


Figure  2.1:  A  simple  LKS. 

Intuitively,  an  LKS  can  model  the  behavior  of  a  system  in  terms  of  both  states  and 
events.  We  denote  the  set  of  all  LKSs  by  CKS.  The  successor  function  Succ  maps  a 
state  s  and  an  action  a  to  the  set  of  a-successors  of  s.  Additionally,  the  propositional 
successor  function  PSucc  maps  a  state  s,  an  action  a  and  a  set  of  atomic  propositions 
P  to  the  (possibly  empty)  set  of  a-successors  of  s  that  are  labeled  with  P.  We  now 
present  these  two  functions  formally. 

Definition  2  (Successor  Functions)  Let  M  =  (S,  I  nit,  AP,  L,  E,  T)  be  an  LKS. 
The  successor  functions  Succ  :  S  x  E  — >  2s  and  PSucc  :  S  x  Ex  2AP  — »  2s  are 
defined  as  follows: 

Succ(s ,  a)  =  {s'  G  S  |  s  s'} 

PSucc(s ,  a,  P)  =  {s'  E  S  \  s  s'  A  L(s')  =  P} 
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Example  2  For  the  LKS  M  shown  in  Figure  2.1,  we  have  the  following: 

•  Succ(  1,  a)  =  {2,  3},  Succ(  1,  (5)  =  Succ(  1,  x)  =  0. 

•  Succ(2,a)  =  Succ(2,x)  =  0,  Succ(2,(3)  =  {4}. 

•  PSucc(  1,  a,  {p})  =  0,  PS'mcc(1,  a,  {g})  =  {2},  PSwcc(l,  a,  {p,  g})  =  0. 

•  PSucc( 2,  (3,  {p})  =  0,  PSucc( 2,  /3,  {g})  =  0,  PSucc( 2,  (3,  {p,  g})  =  {4}. 

Note  that  both  the  set  of  states  and  the  alphabet  of  an  LKS  can  in  general 
be  infinite.  Also,  an  LKS  can  be  non-deterministic,  i.e.,  an  LKS  M  = 
(S,  I  nit,  AP,  L,E,  T)  could  have  a  state  s  G  S  and  an  action  a  G  E  such  that 
\Succ(s,a)\  >  1.  However,  M  is  said  to  have  /imfe  non- determinism  if  for  any  state 
s  6  5  and  any  action  a  G  E,  the  set  Succ(s,  a)  is  always  finite.  In  the  rest  of  this 
thesis  we  will  only  consider  LKSs  with  finite  non-determinism. 

Actions  are  used  to  model  observable  or  unobservable  behaviors  of  systems. 
Accordingly,  we  assume  that  observable  actions  are  drawn  from  a  denumerable  set 
ObsAct,  while  unobservable  (or  silent)  actions  are  drawn  from  a  denumerable  set 
SilAct.  We  assume  a  distinguished  action  r  G  SilAct. 

Definition3  (Simulation)  Let  Mi  =  (Si,  Init1,  AP±,  L1,  E1;  Tf)  and  M2  = 
(S2,  Init2,  AP2,  L2,E2,  T2)  be  two  LKSs  such  that  APX  =  AP2  and  E,  =  S2.  A 
relation  1Z  C  S1  x  S2  is  said  to  be  a  simulation  relation  iff  it  obeys  the  following  two 
conditions: 

1.  Vsi  G  Si  •  Vs2  G  <5*2  ■  si7Zs2  v  Li(si)  =  L2(s2) 

A  Vsi  G  .  Vs2  G  S2  .  Vo  G  Ex .  Vs)  G  Si. 

(siPs2  A  si  s[ )  =>-  3s'2  G  S2  .  s2  s'2  A  s^s) 
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We  say  that  M \  is  simulated  by  M2,  and  denote  this  by  Mi  =4  M2,  iff  there  exists 
a  simulation  relation  IZ  such  that  the  following  condition  holds: 

Vsi  G  Initi .  3s2  G  Init2  .  s{lZs2 


M  ]  M 0 

Figure  2.2:  Two  LKSs  demonstrating  simulation. 


Example  3  Consider  the  LKSs  M\  and  M2  shown  in  Figure  2.2.  It  is  clear  that 
Mi  F'  M2  since  the  the  following  is  a  simulation  relation  that  relates  the  pair  of 
initial  states  (1, 6). 

ft  =  {(1,6),  (2,  7),  (3,  7),  (4,  8),  (5,  9)} 

On  the  other  hand  M2  M{ .  Intuitively  this  is  because  of  state  7  of  M2.  Note  that 
state  7  can  do  both  actions  (3  and  but  no  state  of  Mx  can  do  both  (3  and  x-  Hence 
no  state  of  Mi  can  correspond  to  state  7  in  accordance  with  a  simulation  relation. 

Definition  4  (Weak  Simulation)  Let  Mi  =  (Si,  Initi,  APi,  Lx,  E1?  Tf)  and  M2  = 
(S2,  Init2,  AP2 ,  L2,  E2,  T2)  be  two  LKSs  such  that  AP 1  =  AP2  and  =  S2U{r}.  A 
relation  1 Z  C.  Si  x  S2  is  said  to  be  a  weak  simulation  relation  iff  it  obeys  the  following 
three  conditions: 
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1.  Vs i  G  Si  >  Vs2  G  S2  >  SiTZ-S-2  V  Fi(si)  —  -^2(^2) 

2.  Vsi  G  Si  .  Vs2  G  5*2  .  Vsi  G  Si. 

(siHS2  A  Si  — ^  si)  ==k  s'{R,S2  V  3s'2  G  5*2  ■  s2  — s'2  A  si^s'2 

5.  Vsi  G  Si  .  Vs2  G  S'2  .  Va  G  T,1  \  {r}  .  Vsi  e  Si- 

(si^s2  A  si  si)  =>-  3s'2  G  S'2  .  s2  s'2  A  si^s'2 

iVcVe  that  if  t  ^  S2  £/ien  condition  2  above  is  equivalent  to  the  following: 

Vsi  G  5*1  .  Vs2  G  S2  .Vsi  e  5i  ■  (si^s2  A  Si  — si)  =>•  si7^s2 

We  say  that  Mi  is  weakly  simulated  by  M2,  and  denote  this  by  Mi  V  M2,  iff  there 

exists  a  weak  simulation  relation  1Z  such  that  the  following  condition  holds: 

Vsi  G  Init i .  3s2  G  Init2  ■  si7^s2 


IpI  l 


/V/ 1  M2 

Figure  2.3:  Two  LKSs  demonstrating  weak  simulation. 


Example  4  Consider  the  LKSs  Mi  and  M2  shown  in  Figure  2.3.  It  is  clear  that 
Mi  V  M2  since  the  the  following  is  a  weak  simulation  relation  that  relates  the  pair  of 
initial  states  (1,  6). 

ft  =  {(1,6),  (2, 6),  (3,  6),  (4,  7),  (5,  8)} 


19 


On  the  other  hand  M2  ^  M\.  Intuitively  this  is  because  of  state  6  of  M2.  Note  that 
state  6  can  do  both  actions  a  and  j3,  but  no  state  of  Mi  can  do  both  a  and  j3.  Hence  no 
state  of  Mi  can  correspond  to  state  6  in  accordance  with  a  weak  simulation  relation. 


Note  the  important  difference  between  simulation  and  weak  simulation.  Clearly 
Mi  M2  since  the  initial  state  1  of  Mi  can  do  the  r  action  while  the  initial  state 
6  of  M2  cannot.  In  general,  weak  simulation  allows  M2  to  simply  ignore  r  actions 
performed  by  Mi. 

Our  notion  of  weak  simulation  is  derived  from  that  of  CCS  [85]  and  differs  critically 
from  notions  of  weak  simulation  presented  by  others  [39] .  In  particular,  our  notion  of 
weak  simulation  does  not  preserve  liveness  properties  since  it  is  completely  insensitive 
to  (even  infinite  sequences  of)  r  actions.  For  example,  consider  the  the  two  LKSs 
shown  in  Figure  2.4.  Clearly  Mi  ^  M2.  Now  consider  the  liveness  property  f  that 
“eventually  action  a  occurs".  Clearly  M2  satisfies  (f>  while  Mi  does  not.  However  the 
non-preservation  of  liveness  properties  by  our  notion  of  weak  simulation  will  not  be 
a  problem  since  we  will  use  weak  simulation  only  for  compositional  validation  of 
counterexamples.  Further  details  will  be  presented  in  Chapter  6. 


Figure  2.4:  Two  LKSs  showing  that  weak  simulation  does  not  preserve  liveness  properties. 
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The  following  two  results  about  simulation  are  well-known  and  will  be  used 
crucially  in  the  later  part  of  the  thesis. 

Theorem  1  (Transitivity)  The  following  statement  is  valid: 

VMi  g  acs .  vm2  g  acs .  vm3  g  acs .  m1  ^  m2  a  m2  =$  m3  =►  m,  =4  m3 

Proof.  Let  M1  =  (S1,  Init1,  APll  Lx,  7\),  M2  =  (S2,  Init2,  AP2:  L2,  S2,  T2)  and 
M3  =  (S3,  Init3,  AP3,  L3,  S3,  T3).  Let  72i2  C  5)  x  S2  be  a  simulation  relation  such 
that  Vs i  G  Initi.3s2  G  Init2.(s i,  s2)  G  72i2  and  7223  C  S2xS3  be  a  simulation  relation 
such  that  Vs2  G  Init2 . 3s3  G  Init3  .  (s2,  s3)  G  7223.  Define  a  relation  1Z\3  C  Si  x  S3 
as  follows: 

7^-13  =  {(si)  s3)  |  3s2  .  (si,  s2)  G  72-12  A  (s2,  s3)  G  7223} 

Show  that  72i3  is  a  simulation  relation  and  Vsi  G  Init1 . 3s3  G  Init3  .  (si,  s3)  G  7213. 


□ 


Theorem  2  (Witness)  The  following  statement  is  valid: 

VMi  G  £/C5  .  VM2  g  £/C5  .  VM3  G  OCS  .  M1  4  M2  A  Mx  4  M3  =►  M2  4  M3 

Proof.  This  is  a  direct  consequence  of  Theorem  1. 


□ 

Theorem  1  states  that  simulation  is  a  transitive  relation  on  OCS.  Theorem  2 
provides  a  notion  of  a  witness  to  the  absence  of  simulation  between  two  LKSs. 
Essentially  it  states  that  an  LKS  Mi  is  a  witness  to  M2  ^  M3  iff  M\  is  simulated 
by  M2  but  not  simulated  by  M3.  We  will  use  a  witness  LKS  as  a  counterexample  to 
simulation  later  in  this  thesis. 
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Definition  5  (Parallel  Composition)  Let  Mi  =  (Si,  Initi,  APi,  T \ )  and 

M2  =  (S2,  Init2,  AP2,  L2,  E2,  T2)  6e  two  LKSs  such  that  AP1  D  AP2  =  0- 
Then  the  parallel  composition  of  Mi  and  M2,  denoted  by  Mi  ||  M2,  is  an  LKS 
(/Sy,  Initp  APp  L||,  E||,  Ty)  such  that:  (i)  .Sy  —  Si  x  S2,  (ii)  Init\\  =  Initi  x  Init2, 
(Hi)  AP\\  =  APi  U  AP-2,  (iv)  E||  =  U  S2;  and  the  state  labeling  function  and 
transition  relation  are  defined  as  follows: 


•  Vsi  G  Si .  Vs2  G  S2 .  L||  (si,  s2)  —  Z/j (si)  U  L2(s2) . 

•  Vsi  G  .  Vs2  eS2.\/a<f  E2  .  Vs;  G 

-si  s'i  =►  (si,s2)^(s;,s2) 

•  Vsi  G  ^  .  Vs2  G  S2  .  Vo  ^  Ex .  Vs'2  G  S2. 

s2  s'2  =>  (si,s2)  (si,s'2) 

•  Vsi  G  Si  .  Vs2  G  S2  .  Vo  G  Ex  n  E2  .  Vs;  G  S1  .  Vs'2  G  S2. 

Si  — >  s;  A  s2  — >  s2  (si,  s2)  — >  (s;,  s2) 

Example  5  Figure  2.5  shows  two  LKSs  Mi  and  M2  and  their  parallel  composition 
Mi  ||  M2.  Note  that  in  general  parallel  composition  leads  to  more  states  and 
transitions. 

The  above  notion  of  parallel  composition  is  central  to  our  approach.  We  assume 
that  when  several  components  are  executed  concurrently,  they  synchronize  on  shared 
actions  and  proceed  independently  on  local  actions.  We  will  see  later  that  the  LKSs 
we  compose  will  not  contain  the  r  action  in  their  alphabet.  Hence  we  do  not  need 
to  define  parallel  composition  specially  for  r.  This  notion  of  parallel  composition 
has  been  used  in,  e.g.,  CSP  [68,69,100],  and  by  Anantharaman  et.  al.  [3].  Parallel 
composition  is  commutative  and  associative.  In  addition,  both  simulation  and  weak 


22 


My  ||  M2 


Figure  2.5:  Three  LKSs  demonstrating  parallel  composition. 

simulation  are  congruences  with  respect  to  parallel  composition,  as  stated  by  the 
following  result. 

Theorem  3  Let  My,  M[,  M2  and  M'2  be  LKSs.  Then  the  following  holds: 

My  =4  M[  A  M2  ^  M'  ==►  (Mi  ||  M2)  S*  (. M[  ||  M') 

My  S  M[  A  M2<  M'  =►  (My  ||  M2)  ^  (M[  ||  M') 
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Chapter  3 


C  Programs 


In  this  chapter  we  provide  a  formalization  of  a  C  component  and  a  C  program.  Our 
goal  is  to  present  formally  the  syntax  and  semantics  of  components  and  programs. 
The  semantics  will  be  defined  in  an  operational  manner  using  LKSs.  The  correctness 
of  the  rest  of  this  thesis  depends  critically  on  this  semantics.  In  particular,  this 
semantics  will  be  used  in  the  next  chapter  to  show  that  the  models  we  construct  from 
C  programs  via  predicate  abstraction  are  conservative  in  a  very  precise  sense. 

In  the  rest  of  this  thesis  we  will  write  program  and  component  to  mean  a  C 
program  and  a  C  component,  respectively.  Each  component  will  correspond  to  a  single 
non-recursive  C  procedure.  This  procedure  will  be  obtained  by  automatically  inlining 
all  library  routines  except  the  ones  used  for  communication  and  synchronization  with 
other  components.  This  is  always  possible  if  the  component  is  non-recursive  and  the 
source  code  for  library  routines  to  be  inlined  is  available.  We  assume  that  components 
of  a  C  program  communicate  with  each  other  via  blocking  message-passing.  We  will 
also  assume  several  other  restrictions  on  our  C  programs.  We  discuss  these  restrictions 
further  in  Section  3.3. 
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We  assume  that  all  program  variables  are  drawn  from  a  denumerable  set  V ar  and 
that  structure  fields  are  drawn  from  a  denumerable  set  Field.  We  will  write  Dom((p) 
and  Range((p)  to  denote  the  domain  and  range  respectively  of  a  function  if .  We  will 
write  D  R  to  denote  a  partial  function  from  a  domain  D  to  a  range  R. 

We  will  assume  that  every  variable  and  address  value  is  drawn  from  a  domain  D. 
The  only  requirement  on  this  domain  D  is  that  the  familiar  arithmetic,  bitwise  and 
logical  operators  be  defined  on  it.  For  instance  D  could  be  the  set  of  32-bit  integers. 
Furthermore,  every  variable  and  structure  held  has  a  type.  The  set  of  types  is  denoted 
by  Type  and  consists  of  a  single  base  type  D  and  struct  types.  An  object  of  struct 
type  is  a  record  containing  fields  of  other  types.  Therefore,  the  set  of  types  can  be 
defined  by  the  following  BNF  grammar: 

Type  :=  D  |  struct(Fie/d  x  Type)  + 

We  denote  the  type  of  any  variable  v  by  Type(u).  We  assume  an  injective  address 
function  Address  whose  domain  is  V ar  U(Dx  Field),  whose  range  is  D,  and  which 
obeys  the  following  additional  constraints: 

•  Let  v  be  any  variable  such  that  Type(u)  =  struct ( (/1?  ti) , . . . ,  (/*,,  f*,)).  Then: 

VI  <  i  <  k  .  ( Address{y ),  /*)  G  Dom(Address) 

This  ensures  that  for  any  structure  variable  v  with  a  held  /,  Address  assigns 
an  address  to  the  location  v.f. 

•  Let  /  be  any  held  such  that  (i)  3z  G  D  .  (z,  f )  G  Dom(Address)  and  (ii) 
Typ e(/)  =  struct((/i,ti) , . . . ,  (. fk,tk )).  Then: 

VI  <  i  <  k  .  (Address((z,  /)),  fi)  G  Dom(Address) 
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This  ensures  that  for  every  location  v.f  which  is  itself  a  structure  with  a  field 
f,  Address  assigns  an  address  to  the  location  v.f.f. 

Intuitively,  the  above  two  constraints  ensure  that  every  location  has  a  well-defined 
address.  Then  a  store  is  simply  a  mapping  from  addresses  to  values.  A  store  is 
intended  to  model  the  memory  configuration  at  any  point  during  the  execution  of  a 
C  program. 

Definition  6  (Store)  A  store  is  a  mapping  a  :  D  — >  D  from  addresses  to  values. 
The  set  of  all  stores  is  denoted  by  Store. 


3.1  Expressions 

Let  Expr  denote  the  set  of  all  side-effect  free  expressions  over  Var.  In  addition, 
expressions  (such  as  variables  and  structure  fields)  that  correspond  to  some  valid 
memory  location  are  called  lvalues  and  form  a  subset  of  Expr  denoted  by  LValue. 
Intuitively  an  lvalue  is  an  expression  on  which  the  address-of  (&)  operator  can  be 
applied.  The  syntaxes  of  a  Expr  and  LValue  are  given  by  the  BNF  grammars  shown 
in  Table  3.1. 

3.1.1  Expression  Evaluation 

The  function  V al  :  Store  x  Expr  D  maps  a  store  a  and  an  expression  e  to  the 
evaluation  of  e  under  a.  Similarly  the  function  Add  :  Store  x  LValue  — >  D  maps 
a  store  o  and  an  lvalue  /  to  the  address  of  l  under  a.  Add  is  defined  inductively  as 
shown  in  Table  3.2.  Similarly,  V al  is  defined  inductively  as  shown  in  Table  3.3.  The 
definition  of  V al  requires  the  following  functions  over  integers: 
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Expr 

:=  D  V ar  LValue.  Field 

&  LValue  *  Expr 

-  Expr  Expr  +  Expr  Expr 

-  Expr 

Expr  *  Expr  Expr  /  Expr 

!  Expr  Expr  &&  Expr  Expr  ||  Expr 

Expr  Expr  &  Expr  Expr 

1  Expr 

Expr  "  Expr  Expr  «  Expr 

Expr  »  Expr 

LV  alue 

:=  V  ar  *Expr  LValue. Field 

Table  3.1:  BNF  grammars  for  Expr  and  LValue. 

•  +,  — ,  x,  4-  are  the  standard  arithmetic  functions  of  type  D  x  D  ->  D. 

•  Neg  :  D  ->  D  is  the  logical  negation  function  that  maps  0  to  1  and  any  non-zero 
integer  to  0. 

•  And  :  D  x  D  — >  D  is  the  logical  conjunction  function  that  maps  any  pair  of 
non-zero  integers  to  1  and  any  other  pair  of  integers  to  0. 

•  Or  :  D  x  D  — >  D  is  the  logical  disjunction  function  that  maps  the  pair  (0,  0)  to 
0  and  any  other  pair  of  integers  to  1. 

•  BNeg  :  D  — »  D  is  the  bitwise  negation  function. 

•  BAnd,  BOr,  BXor,  BLsh,  and  BRsh  are  the  bitwise  AND,  OR,  exclusive-OR,  left- 
shift  and  right-shift  functions  of  type  D  — >  D. 

Example  6  Let  v  be  a  variable.  Let  us  denote  the  address  of  v,  i.e.,  Address{v) 
by  A.  Let  cr  be  a  store  such  that  cr {A)  =  5.  Then  we  want  the  expression 
(*  &  v)  to  evaluate  to  5  under  the  store  a.  Let  us  see  how  this  happens.  First, 


Add(a,v )  =  Address{v)  =  A.  Hence,  Val(cr,&  v)  =  Add(a,v)  =  A.  Finally, 
Val(a,  *  Sc  v)  —  a(Val(<j,8i  v))  =  (t(A)  =  5  which  is  what  we  want. 


Add(a,  v ) 

=  Address(v ) 

Add(a,  *  e ) 

=  Val(a,e) 

Add(a ,  e.f ) 

=  Address(Add(a ,  e),  /) 

Table  3.2:  Definition  of  function  Add  which  maps  lvalues  to  addresses.  Note  that  the  function 
Address  takes  either  a  single  variable  argument  or  a  pair  of  arguments,  the  first  of  which  is  an 
element  of  D  and  the  second  is  a  structure  field. 


Val  (a,  v ) 

=  a(Add(v)) 

Val  (a,  z ) 

=  z 

Val  (a,  e.f ) 

=  a(Add(a,e.f)) 

Val(cr,-  e ) 

=  0  —  V al(c r,  e) 

Val  (a,  ~  e ) 

=  BNeg(  Val(a,  e)) 

Val\a,  !  e) 

=  Neg(  Val(a,  e)) 

Val(a,k  e ) 

=  Addfc r,  e) 

Val  (a,  *  e ) 

=  cr(  Fa/(cr,  e)) 

Valuer,  ei+ 

e2)  = 

Valuer,  e\ ) 

+  Val(a,e2) 

Valuer,  e\- 

e2)  = 

Valuer,  e\ ) 

—  Valuer,  e2) 

V al(cr,  ei* 

e2)  = 

V al(cr,  e\ ) 

x  Valuer,  e2) 

Val  (a,  e\/ 

e2)  = 

V  al(a ,  ei) 

-T-  Val(a,  e2) 

Val  (a,  ei&&  62)  = 

V al(a,  e\ ) 

And  Val(a,e2) 

Val  (a,  e\  II 

e2)  = 

V al(a,  e±) 

Or  Val(a,e 2) 

Val(a,  ei& 

e2)  = 

V al(a,  e\ ) 

BAnd  Val(cr,e2) 

Val(a,  e\  1 

e2)  = 

Val  (a,  ei) 

BOr  Val(a,e2 ) 

Val(a,  eT 

e2) 

V al(a,  ei) 

BXor  Val(a,  e2) 

Val(a,  e\«  e 2)  = 

V al(a,  e\ ) 

BLsh  Val(cr,e 2) 

Val(cr,  e\»  e^  = 

V al(a,  e\ ) 

BRsh  Valuer,  e 2) 

Table  3.3:  Definition  of  function  Val  which  maps  expressions  to  values. 


3.1.2  Expressions  as  Formulas 

As  we  have  seen  before,  expressions  in  C  always  evaluate  to  integers.  The  ANSI  C 
standard  does  not  define  a  separate  class  of  Boolean  expressions  for  use  in  contexts 
where  a  Boolean  value  is  required,  e.g.,  in  branch  conditions.  Instead  C  adopts  the 
following  convention  to  handle  such  situations.  The  integer  zero  represents  the  truth 
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value  FALSE  while  any  non-zero  integer  represents  TRUE.  This  means  that  whenever 
an  expression  e  is  used  in  a  Boolean  context,  it  is  implicitly  compared  to  zero  and 
promoted  to  true  if  it  is  non-zero  and  to  false  otherwise. 

In  the  rest  of  this  thesis  we  will  follow  the  same  convention  and  use  C  expressions 
freely  even  in  situations  where  formulas  are  normally  expected.  For  instance,  we 
will  soon  define  the  weakest  precondition  operator  which  takes  a  C  expression 
and  an  assignment  statement  as  arguments  and  returns  a  C  expression.  In  the 
literature,  weakest  preconditions  have  been  traditionally  defined  for  formulas  and  not 
expressions.  In  the  case  of  C  however,  expressions  can  also  be  interpreted  as  formulas 
as  we  have  just  seen.  Hence  a  weakest  precondition  operator  on  C  expressions  makes 
perfect  sense. 

If  you  find  this  use  of  expressions  in  the  place  of  formulas  unsettling,  it  might  help 
to  mentally  convert  expressions  to  formulas  by  comparing  with  zero.  For  example, 
if  you  see  the  sentence  “the  weakest  precondition  of  the  expression  e  with  respect  to 
the  assignment  statement  a”,  read  it  instead  as  the  following  sentence:  “the  weakest 
precondition  of  the  formula  e  /  0  with  respect  to  the  assignment  statement  a”.  In  the  rest 
of  this  thesis  we  will  usually  omit  such  explicit  comparisons  with  zero  for  the  sake  of 
brevity. 

We  will  carry  this  idea  of  the  promotion  of  C  expressions  to  formulas  a  little  bit 
further.  Recall  that  a  C  expression  e  evaluates  to  Val(a,e )  under  a  store  a.  Since  e 
can  be  viewed  as  a  formula,  we  can  also  view  o  as  an  interpretation  in  the  traditional 
logical  sense.  Thus,  for  instance,  we  can  say  that  cr  satisfies  (is  a  model  of)  e  iff 
Val(a,e )  yf  0.  We  will  denote  the  satisfaction  of  an  expression  e  by  a  store  a  by 
o  N  e  to  make  this  correspondence  even  more  explicit. 
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3.1.3  Propositions  and  Expressions 


We  wish  to  think  of  atomic  propositions  and  expressions  as  counterparts  in  the 
abstract  and  concrete  domains.  Intuitively,  a  proposition  is  an  abstract  representative 
of  its  corresponding  expression  while  an  expression  is  a  concrete  version  of  its 
corresponding  proposition.  To  make  this  notion  more  formal,  recall  that  atomic 
propositions  are  drawn  from  a  denumerable  set  AP  while  the  set  of  expressions  Expr 
is  also  denumerable..  The  correspondence  between  the  abstract  propositions  and  the 
concrete  expressions  is  captured  by  a  concretization  Injection  Concrete  :  AP  — > 
Expr.  In  this  chapter,  we  will  use  the  correspondence  between  propositions  and 
expressions  to  determine  which  atomic  propositions  hold  in  a  concrete  state  of  a 
component.  In  Chapter  4  we  will  use  it  additionally  to  present  predicate  abstraction. 

3.2  Component 

At  at  very  high  level,  a  component  can  be  thought  of  as  the  control  flow  graph  of  a  C 
procedure  such  as  the  one  shown  in  Figure  3.1.  Thus,  it  is  essentially  a  directed  graph 
whose  nodes  correspond  to  statements,  and  whose  edges  model  the  possible  flow  of 
control  between  statements.  Every  statement  of  a  component  has  a  type  drawn  from 
the  set  T  =  {ASGN:  CALL,  BRAN,  EXIT}.  Intuitively,  ASGN  represents  an  assignment 
statement,  CALL  represents  the  invocation  of  a  library  routine,  BRAN  represents  an 
if-then-else  branch  statement,  and  EXIT  represents  the  exit  point  of  a  component’s 
execution.  Also,  statements  are  associated  with  branch  conditions,  left  and  right 
hand  sides,  and  with  appropriate  successor  statements.  We  now  define  a  component 
formally. 

Definition  7  (Component)  A  component  is  a  tuple  with  eight  components 
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void  component ()  { 
int  x , y , z ; 
x  =  y; 
if  (z)  { 

if(x)  alpha () ; 
else  chi () ; 

}  else  { 

if (y)  beta() ; 
else  deltaQ  ; 

} 

} 

Figure  3.1:  Syntax  of  a  simple  component.  We  will  use  this  as  a  running  example. 

(Stmt,  Type,  entry ,  Cond,  LHS ,  RHS ,  Then,  Else)  where  :  (i)  Stmt  is  a  finite 
non-empty  set  of  statements,  (ii)  Type  :  Stmt  — >  T  is  a  function  mapping  each 
statement  to  a  type,  (in)  entry  G  Stmt  is  the  initial  or  entry  statement,  (iv)  Cond  is 
a  partial  function  of  type  Stmt  Expr  which  maps  branch  statements  to  their 
branch  conditions,  (v)  RHS  is  a  partial  function  of  type  Stmt  Expr  which 
maps  assignments  to  their  right-hand- sides,  (vi)  LHS  is  a  partial  function  of  type 
Stmt  LValue  which  maps  assignments  to  their  left-hand- sides,  and  (vii)  Then 
and  Else  are  partial  successor  functions  of  type  Stmt  r-^  Stmt  which  map  statements 
to  their  then  and  else  successors  respectively. 

Let  C  be  a  component.  In  order  to  be  valid,  C  must  satisfy  certain  sanity 
conditions.  For  instance,  the  type-labeling  function  Typec  must  obey  the  following 
condition: 

•  (COMP1)  There  is  exactly  one  exit  statement. 

3s  G  Stmtc  ■  Typec(s )  =  EXIT  A  Vs'  G  Stmtc  •  Typec(s ')  =  EXIT  =>•  s'  =  s 

Moreover  the  expression- labeling  functions  Condc,  LHSc  and  RHSc  must  obey 
the  following  conditions: 
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•  (COMP2)  The  if-then-else  statements  and  only  the  if-then-else  statements 
have  branch  conditions. 

Dom(Condc)  =  {s6  Strata  \  Typec(s)  =  BRAN} 

•  (COMP3)  The  assignment  statements  and  only  the  assignment  statements 
have  left-hand-sides  and  right-hand-sides. 

Dom(LHSc)  =  Dom(RHSc)  =  {s6  Strata  \  Typec(s )  =  ASGN} 

Let  us  denote  the  set  of  call  statements  (statements  of  type  CALL)  of  C  by  Call{C). 

Call{C)  =  (s  G  Strata  \  Typec(s)  =  CALL} 

Note  that  we  have  disallowed  library  routine  calls  that  return  values.  This  is 
because,  as  mentioned  before,  in  our  framework  a  library  routine  is  expected  to 
perform  externally  observable  actions  without  altering  the  data  state  of  the  program. 
Finally,  the  successor-labeling  functions  Thenc  and  Elsec  must  obey  the  following 
conditions: 

•  (COMP4)  Every  statement  except  the  exit  point  has  a  then  successor. 

Dora(Thenc)  =  {sG  Strata  \  Typec(s )  ^  EXIT} 

•  (COMP5)  The  if-then-else  statements  and  only  the  if-then-else  statements 
have  else  successors. 

Dom(Elsec)  =  {s6  Strata  \  Typec(s )  =  BRAN} 

In  the  rest  of  this  thesis  we  will  only  consider  valid  components,  i.e.,  components 
that  obey  conditions  COMP1— COMP5. 
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LHS(  1)  =  x 
RHS(  1)  =  y 
Type(l)  =  ASGN 
Type(2)  =  BRAN 
Concl  (2)=  z 
Else(2)  =  4 

0 

© 

Type(9)  =  EXIT 


Figure  3.2:  Component  for  the  C  procedure  shown  in  Figure  3.1.  We  will  use  this  as  a  running 
example. 

Example  7  Figure  3.2  shows  the  component  C  corresponding  to  the  C  procedure 
shown  in  Figure  3.1.  The  set  of  statements  of  C  is  Stmtc  =  { 1 , . . . ,  9}  and  its 
initial  statement  is  entry c  =  1.  Some  of  the  statements  are  labeled  with  their  types, 
associated  expressions  and  successors.  The  set  of  library  routines  invoked  by  C  is 
{alpha,  beta,  chi,  delta}.  Note  that  C  satisfies  conditions  COMP1— COMP5. 

3.2.1  Component  Semantics 

In  this  section,  we  will  present  the  concrete  semantics  of  a  component  in  terms 
of  an  LKS,  which  we  will  refer  to  as  a  semantic  LKS.  Intuitively,  a  component 
C  by  itself  only  represents  the  control  flow  structure  along  with  the  assignments, 
branch  conditions  and  library  routines  that  are  invoked  at  each  control  point.  In 
order  to  describe  the  semantic  LKS,  we  need  information  about  the  behavior  of  the 
library  routines  invoked  by  C  and  information  about  the  initial  states,  set  of  atomic 
propositions,  and  alphabet  of  the  semantic  LKS.  This  information  will  be  provided 
by  a  context.  Therefore  we  will  first  describe  contexts  before  going  into  the  details  of 
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the  semantic  LKS. 


In  general,  a  library  routine  may  perform  externally  observable  actions  and  also 
alter  the  data  state  of  a  component.  Additionally,  such  behavior  may  be  guarded  by 
certain  conditions  on  the  data  state  of  the  component.  Thus,  we  need  a  formalism 
that  can  finitely  summarize  the  behavior  of  a  routine,  and  yet  is  powerful  enough 
to  express  guarded  actions  and  assignments.  The  natural  alternative  is  an  extended 
finite  state  machine  which  is  essentially  a  finite  automata  whose  transitions  are  labeled 
with  guarded  commands,  where  a  command  is  either  an  action  or  an  assignment.  We 
now  present  extended  finite  state  machines  formally: 

Definition  8  (Extended  Finite  State  Machine)  Let  E  be  an  alphabet.  Then  an 
extended  finite  state  machine  (EFSM  for  short)  over  E  is  a  triple  (S',  I  nit,  A)  where 
(i)  S  is  a  finite  set  of  states,  (ii)  I  nit  C  S  is  a  set  of  initial  states,  and  (in) 
5  C  (S'  x  Expr  x  E  x  S')  U  (S'  x  Expr  x  LValue  x  Expr  x  S')  is  a  transition  relation. 

The  only  subtle  aspect  of  above  definition  is  the  description  of  the  transition 
relation.  As  can  be  seen,  the  transition  relation  A  consists  of  two  kinds  of  elements. 
The  first  kind  of  element  is  a  4-tuplc  of  the  form  (si,  g,  a,  S2).  This  represents  a 
transition  from  state  Si  to  state  S2  guarded  by  the  expression  g  with  the  command 
being  the  action  a.  We  will  denote  such  a  transition  by  Si  s 2.  The  second  kind  of 

element  is  a  5-tuplc  of  the  form  (si,  g ,  /,  r,  s2).  This  represents  a  transition  from  state 
Si  to  state  S2  guarded  by  the  expression  g  with  the  command  being  the  assignment 
l  r.  We  will  denote  such  a  transition  by  Si  9^—f  s2. 

The  set  of  all  EFSMs  is  denoted  by  FSM..  Let  F  G  FSM.  be  any  EFSM.  Then  F 
has  a  distinguished  state  STOP  which  has  no  outgoing  transitions.  Intuitively,  STOP 
represents  the  termination  of  a  library  routine. 
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Example  8  For  example,  consider  a  library  routine  lib  with  a  single  parameter  p. 
Suppose  that  the  behavior  of  lib  can  be  described  as  follows: 


•  If  lib  is  invoked  with  a  true  (non-zero)  argument,  it  either  does  action  a  and 
assigns  1  to  variable  v  or  does  action  /3  and  assigns  2  to  variable  v. 

•  If  lib  is  invoked  with  a  false  (zero)  argument,  it  assigns  0  to  variable  v. 

Then  the  behavior  of  lib  can  be  expressed  by  the  EFSM  shown  in  Figure  3.3. 


Figure  3.3:  The  EFSM  corresponding  to  the  library  routine  from  Example  8.  The  initial  state 
is  indicated  by  an  incoming  transition  with  no  source  state. 

A  context  for  a  component  C  will  employ  EFSMs  to  express  the  behavior  of  the 
library  routines  invoked  by  C.  However,  in  order  to  describe  the  semantics  of  C ,  we 
need  to  also  know  about  the  initial  states,  set  of  atomic  propositions  and  alphabet 
of  the  semantic  LKS.  As  mentioned  before,  this  information  will  also  be  provided  by 
the  context.  We  now  define  a  context  formally. 

Definition  9  (Component  Context)  A  context  for  a  compoiient  C  is  a  5-tuple 
(InitCond,  AP ,  E,  Silent ,  FSM)  where  (i)  InitCond  G  Expr  is  an  initial  condition, 
(ii)  AP  C  AP  is  a  finite  set  of  atomic  propositions,  (Hi)  E  C  ObsAct  is  a 
set  of  observable  actions,  (iv)  Silent  G  SilAct  \  {r}  is  a  silent  action,  and  (v) 
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FSM  :  Call(C)  — >  IFSAA  is  a  function  mapping  call  statements  of  C  to  EFSMs 
over  the  alphabet  E  U  {Silent}. 

Example  9  Recall  the  component  C  from  Figure  3.2.  Now  we  let  7  be  the  context 
for  C  such  that:  (i)  InitCond 7  =  true,  (ii)  AP 7  =  0,  (in)  E7  =  {a,  (3,  y,5}, 
(iv)  Silent 7  =  77,  and  (v)  recall  that  C  has  four  call  statements  {5,  6,  7,  8}  which 
invoke  library  routines  alpha,  chi,  beta  and  delta  respectively;  then  FSM 7  maps  the 
call  statements  {5,6,7, 8}  respectively  to  the  EFSMs  Fa,Fx,Fp  and  Fs  shown  in 
Figure  S.f.  Intuitively  this  means  that  these  routines  simply  perform  the  actions 
a,  x,  fd  and  5  respectively. 


Figure  3.4:  The  EFSMs  corresponding  to  the  context  from  Example  9. 

Note  that  a  context  is  specific  to  C  since  it  must  provide  EFSMs  for  only  the  call 
statements  of  C.  Let  7  =  (InitCond,  AP,Y.,  Silent,  FSM)  be  any  context  for  C. 
Then  the  semantics  of  C  under  7  is  denoted  by  [C]  .  In  the  rest  of  this  section  the 
context  7  will  be  fixed  and  therefore  we  will  often  omit  it  when  it  is  obvious.  For 
example,  we  will  write  [C]  to  mean  [C]  .  Formally,  [C]  is  an  LKS  such  that: 

•  [C]  has  two  kinds  of  states  -  normal  and  inlined.  A  normal  state  is  simply  a 
pair  consisting  of  a  statement  of  C  and  a  store. 

gnormal  =  y  gtore 
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An  inlined  state  is  obtained  by  inlining  EFSMs  at  corresponding  call  statements. 
Recall  that  for  any  call  statement  s,  FSM  (s)  denotes  the  EFSM  corresponding 
to  s.  Therefore,  an  inlined  state  is  simply  a  triple  (s,  cr,  l)  where  s  is  a  call 
statement,  cr  is  a  store  and  i  is  a  state  of  FSM(s).  More  formally  the  set  of 
inlined  states  Sinhned  is  defined  as: 

Sinlined  =  {(s,  a,  i)  |  s  G  CalliC )  A  a  G  Store  A  i  G  SFSM(s)} 

where  SF  denotes  the  set  of  states  of  an  EFSM  F.  Finally,  a  state  of  [C]  is 
either  a  normal  state  or  an  inlined  state. 


%] 


g normal 


g inlined 


•  An  initial  state  of  [C]  corresponds  to  the  entry  statement  of  C  and  a  store  that 
satisfies  the  initial  condition  InitCond  specified  by  the  context  7. 

Init[c\  =  {( entryc,a )  \  o  1=  InitCond} 


•  Recall  that  AP  is  the  set  of  atomic  propositions  specified  by  the  context  7.  The 
atomic  propositions  of  [C]  are  the  same  as  those  specified  by  7. 

Apic\  =  Ap 

•  The  labeling  function  Lpj  of  [C]  is  consistent  with  the  expressions  corresponding 
to  the  atomic  propositions.  Recall  that  the  bijection  Concrete  maps  atomic 
propositions  to  expressions.  Since  the  propositional  labeling  does  not  depend 
on  the  inlined  EFSM  state,  its  definition  will  be  identical  for  normal  and  inlined 
states.  More  formally: 

Lid(s,a)  =  L[C] (s,  a,  t)  =  {p  G  APjq  |  cr  b  Concrete(p)} 
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•  The  alphabet  of  [C]  contains  the  specified  observable  and  silent  actions.  Recall 
that  E  is  the  set  of  observable  actions  associated  with  the  context  7  and  Silent 
is  the  silent  action  associated  with  the  context  7. 

£[Cj  =  SU  {Silent} 

Note  that  r  ^  Erq.  This  fact  will  be  used  crucially  for  the  compositional 
verification  technique  presented  later  in  this  thesis. 

•  The  transition  relation  of  the  semantics  [C]  is  defined  in  the  following  section. 

3.2.2  Transition  Relation  of  [C] 

In  the  rest  of  this  section  we  will  write  Type ,  Then ,  Else ,  Cond ,  LHS  and  RHS  to 
mean  Typec,  Thenc ,  Elsec ,  Condc ,  LHSc  and  RHSc  respectively.  We  will  describe 
outgoing  transitions  from  normal  and  inlined  states  separately. 

Normal  States.  Let  (s,  cr)  be  a  normal  state  of  [C].  Recall  that  s  G  Stmtc  is  a 
statement  of  C  and  cr  G  Store.  We  consider  each  possible  value  of  the  type  Type(s) 
of  s  separately. 

•  Type(s)  =  EXIT.  In  this  case  (s,  cr)  has  no  outgoing  transitions. 

•  Type(s)  =  BRAN.  Recall  that  Cond(s)  is  the  branch  condition  associated 
with  s  while  Then(s)  and  Else(s)  are  the  then  and  else  successors  of  s. 
In  this  case  (s,  cr)  performs  the  Silent  action  and  moves  to  either  Then(s) 
or  Else(s)  depending  on  the  satisfaction  of  the  branch  condition.  The  store 
remains  unchanged.  Formally: 

cr  b  Cond(s)  ==>•  (s,  cr)  ( Then(s),cr ) 
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a¥Cond(s)  =>•  (s,  a)  SllRlt  (Flse(s),  a) 

•  Type(s)  =  ASGN.  In  this  case  (s,  a)  performs  the  Silent  action  and  moves  to 
the  then  successor  while  the  store  is  updated  as  per  the  assignment.  Formally, 
for  any  assignment  statement  a,  let  a  [a]  be  the  store  such  that  the  following 
two  conditions  hold: 

Vx  6  D  .  i  /  Add(cr,  LHS(s))  ==>-  cr[a](x )  =  a(x) 

cr[a](Add(cr,  LHS(s)))  =  Valuer,  RFfS(s)) 

In  other  words,  a[a]  is  the  store  obtained  by  updating  o  with  the  assignment  a. 
Recall  that  LHS(s)  and  RHS(s)  are  the  left  and  right  hand  side  expressions 
associated  with  s.  Then: 

(s,a)  (Then(s),  a[LHS(s)  :=  RHS(s )]) 

•  Type(s)  =  CALL.  In  this  case  (s,u)  performs  the  Silent  action  and  moves  to 
an  initial  state  of  the  EFSM  FSM(s)  corresponding  to  s.  The  store  remains 
unchanged.  Recall  that  InitFSM(s )  denotes  the  set  of  initial  states  of  FSM(s). 
Then: 

Vt  G  InitFSM(s)  •  0,  cr)  S^lt  (s,  a,  t) 

Inlined  States.  Let  s  G  Call(C),  cr  E  Store  and  (s,  cr,  i)  be  an  inlined  state.  Recall 
that  in  this  case  t  must  be  a  state  of  the  EFSM  FSM(s)  corresponding  to  s.  Also 
recall  that  the  transitions  of  FSA4  (s)  are  labeled  with  guarded  commands  of  the  form 
g/a  or  g/l  :=  r.  We  consider  four  possible  types  of  outgoing  transitions  of  FSM(s) 
from  the  state  i. 
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•  i  i'  and  l'  ^  STOP.  If  the  store  a  satisfies  the  guard  g,  then  (s,a,c) 
performs  action  a  and  moves  to  the  inlined  state  corresponding  to  i! .  The  store 
remains  unchanged.  Formally: 

o'  1=  9  =>  (s,a,  i)  (s,  a,  t!) 

•  l  l'  and  l'  =  STOP.  If  the  store  a  satisfies  the  guard  g,  then  (s,a,c) 
performs  action  a  and  returns  from  the  library  routine  call.  The  store  remains 
unchanged.  Formally: 

a  h  g  ==>•  (s,a,  t)  (Then(s),a) 


•  l  g^r  l'  and  l'  ^  STOP.  If  the  store  cr  satisfies  the  guard  g,  then  (s,cr,i) 
performs  Silent  and  moves  to  the  inlined  state  corresponding  to  l' .  The  store 
is  updated  as  per  the  assignment  l  :=  r.  Recall  that  a[l  :=  r]  denotes  the  new 
store  obtained  by  updating  a  with  l  :=  r.  Then: 


O'  N  g  =>  (■ s,a ,  l)  ( s ,  a  [l  :=  r],  t) 


•  l  & - — >  i'  and  t'  =  STOP.  If  the  store  a  satisfies  the  guard  g,  then  (s,a,i) 
performs  Silent  and  returns  from  the  library  routine  call.  The  store  is  updated 
as  per  the  assignment  l  :=  r.  Recall  that  cr[l  :=  r]  denotes  the  new  store 
obtained  by  updating  a  with  l  r.  Then: 

a  b  g  =>•  (s,a,t)  S-l^St  ( Then(s),a[l  :=  r]) 


3.3  Restrictions  on  C  Programs 

We  assume  that  our  input  C  programs  are  in  the  CIL  [92]  format.  This  can  be  easily 
achieved  by  preprocessing  the  input  C  programs  using  the  CIL  [30]  tool  developed 
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by  Necula  et.  al.  The  CIL  format  is  essentially  a  very  simple  subset  of  C  with 
precise  semantics.  For  instance,  the  format  does  not  allow  procedure  calls  inside  the 
argument  of  another  procedure  call.  Also,  expressions  with  side-effects  (  such  as  x++) 
and  shortcut  evaluations  (such  as  a  &&  b)  are  disallowed.  However  CIL  is  expressive 
enough  so  that  any  valid  C  program  can  be  tranlated  into  an  equivalent  CIL  program. 
This  is  precisely  what  the  CIL  tools  achieves. 

Additionally  we  disallow  pointer  dereferences  on  the  left-hand-sides  of  assignments 
as  well  as  the  use  of  function  pointers  in  making  indirect  library  routine  calls.  This 
can  be  achieved  using  alias  information  about  the  pointers  being  dereferenced.  For 
instance  suppose  we  have  the  following  expression: 

*P  =  e; 

*fp()  ; 

Given  that  the  pointer  p  can  point  to  either  variables  x  or  y.z,  and  the  pointer  fp 
can  point  to  either  routines  foo  or  bar,  we  can  rewrite  the  above  code  fragment  into 
the  following  while  preserving  its  semantics: 

if  (p  ==  &  x)  x  =  e; 
else  y.z  =  e; 
if  (fp  ==  &  foo)  foo(); 
else  bar () ; 

Note  that  we  could  also  include  such  aliasing  information  in  a  component’s  context 
since  the  semantics  of  a  component  clearly  depends  on  its  aliasing  scenario.  However, 
we  chose  not  to  do  this  for  two  main  reasons.  First,  aliasing  scenarios  are  more  integral 
to  a  component’s  definition  than  information  which  should  be  present  in  a  context. 
While  a  context  is  like  a  component’s  environment,  aliasing  information  is  usually 
embedded  more  directly  in  the  actual  source  code  and  could  be  extracted,  e.g.,  via 
alias  analysis.  More  importantly,  the  code  transformation  above  converts  aliasing 
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possibilities  into  branch  conditions.  These  branch  conditions  can  be  subsequently 
used  to  infer  appropriate  predicates  for  abstraction  refinement.  The  abstraction 
refinement  procedure  in  MAGIC  is  presented  in  further  detail  in  Chapter  6.  Finally, 
preprocessing  away  pointer  dereferences  on  the  left-hand-sides  of  assignments  will 
simplify  our  presentation  of  predicate  abstraction  in  Chapter  4.  In  particular,  it 
will  enable  us  to  use  a  standard  version  of  the  weakest  precondition  operator  (cf. 
Definition  15)  instead  of  an  extended  version.  This  usually  leads  to  simpler  theorem 
prover  queries  while  performing  admissibility  checks. 

3.4  Symbolic  Representation  of  States 

Let  State  denote  the  set  of  all  states  of  the  concrete  semantics  [C]  of  component 
C  with  respect  to  context  7.  At  several  points  in  the  rest  of  this  thesis,  we  will 
require  the  ability  to  manipulate  (possibly  infinite)  subsets  of  State.  In  this  section 
we  present  such  a  framework.  The  basic  idea  is  to  represent  the  statements  explicitly 
and  to  model  the  stores  symbolically  using  C  expressions. 

Consider  any  subset  S  of  the  set  of  states  State  of  [C] .  Recall  that  each  element 
of  S  is  either  a  normal  state  of  the  form  (s,  cr)  or  an  inlined  state  (s,  cr, 1)  where  s  is 
a  statement  of  C,  cr  is  a  store  and  1  is  an  inlined  EFSM  state.  Since  both  s  and  1  are 
Unitary,  we  can  easily  partition  S  into  a  finite  number  of  equivalence  classes  where 
all  the  elements  of  a  particular  equivalence  class  agree  on  their  s  and  t  components. 
In  other  words,  for  a  fixed  s  and  t,  the  equivalence  class  S(s,  1)  is  defined  as  follows: 

<S(s,  i)  =  {(s',  cr)  E  S  |  s'  =  s}  U  {(s',  cr,  t')  E  S  |  s'  =  s  A  1'  =  <■} 

Clearly,  the  states  within  a  particular  equivalence  class  differ  only  in  their  cr 
components.  Now  suppose  that  for  each  equivalence  class  S(s,  1)  there  exists  an 
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a  t=  e.  Then  we 


expression  e  such  that  (s,  a)  G  S  ate  and  (s,  a,  i)  G  S 

can  represent  the  equivalence  class  S(s,l)  symbolically  using  the  triple  (s,i,e)  and 
hence  we  can  represent  the  entire  set  S  using  a  collection  of  such  triples,  one  for  each 
value  of  s  and  t. 

Clearly  not  all  subsets  of  State  are  representable  (or  expressible)  by  the  above 
scheme.  This  follows  from  a  simple  counting  argument  -  only  countably  many  S’s 
are  expressible  while  the  set  of  all  possible  S’s,  i.e.,  2state,  is  clearly  uncountable. 
However,  as  we  shall  see  in  the  rest  of  this  section,  the  expressible  S’s  form  an 
important  subset  of  2state  which  we  shall  denote  by  S.  A  set  S  C  State  is  said  to  be 
expressible  if  it  belongs  to  S.  We  now  present  this  notion  formally. 

Recall  that  Stmtc  denotes  the  set  of  statements  of  C  and  for  any  call  statement 
s  of  C,  FSM(s)  denotes  the  EFSM  corresponding  to  s.  Instead  of  using  triples  of 
the  form  (s,  i,  e)  we  will  represent  an  expressible  set  of  states  using  a  function.  In 
particular,  let  D  be  the  set  of  all  valid  statements  s  and  pairs  (s,  l)  of  statements  and 
inlincd  EFSM  states.  In  other  words: 

D  =  Stmtc  U  {(s,  t)  |  s  G  Call(C )  A  t  G  Sfsm(s)} 

where  Sf  denotes  the  set  of  states  of  an  EFSM  F.  Then  a  representation  R  :  D  — > 
Expr  is  a  function  from  the  set  D  to  the  set  of  expressions. 

Definition  10  A  representation  is  a  function  D  — >  Expr.  A  representation  R 
corresponds  to  a  set  S,  of  concrete  states  iff  the  following  two  conditions  holds: 

Vs  G  D  .  Vcr  e  Store  .  (s,  a)  G  S  a  1=  R(s) 

V  (s,  i)  G  D  .  Vcr  G  Store  .  (s,  a,  i)  G  S  a  h  R(s,  l) 

Let  us  denote  the  set  of  all  representations  by  Rep.  Recall  that  State  denotes  the  set 
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of  all  states  of  [C] .  For  any  R  G  Rep  and  any  S  C  State,  we  will  write  R  =  S  to 
mean  that  R  corresponds  to  S. 

A  set  of  concrete  states  S  C  State  is  said  to  be  expressible  iff  there  exists  a 
representation  R  such  that  R  =  S.  We  denote  the  set  of  all  expressible  subsets  of 
State  by  £.  In  other  words: 

S  =  {5  C  State  |  3 R  €  Rep  .  R  =  S} 

Theorem  4  We  note  below  a  set  of  simple  results  about  expressible  sets  of  states. 

1.  The  set  State  of  all  states  of  [C]  is  expressible. 

2.  The  set  Init[c\  of  initial  states  of  [C]  is  expressible. 

3.  If  S  is  expressible  then  so  is  its  complement  State  \  S. 

4-  If  Si  and  S2  are  expressible  then  so  are  S\  U  £2  and  S\  fl  £2- 
5.  For  any  proposition  p  the  set  of  states  labeled  with  p  is  expressible. 

Proof.  Recall  that  D  denotes  the  domain  of  any  representation. 

1.  The  following  representation  R  corresponds  to  State:  Vd  G  D  .  R(d)  =  TRUE. 

2.  Recall  that  any  initial  state  of  [C]  is  of  the  form  (entry c,  a)  where  entry c  is 
the  entry  statement  and  the  store  o  satisfies  the  initial  guard  InitCond  of  the 
context  7.  Hence  the  following  representation  R  corresponds  to  the  set  Init pj 
of  initial  states  of  [C] . 

R(entryc)  =  InitCond  Vd  G  D  .  d  7^  entry c  =>-  R(d)  =  false 


45 


3.  Let  R  be  a  representation  corresponding  to  S.  Then  the  following  representation 
R'  corresponds  to  State  \S:  Vd  G  D  .  R' id)  =  ->R(d). 

4.  Let  R\  and  R2  be  representations  corresponding  to  S\  and  S2  respectively.  Then 
the  following  representations  Ru  and  Rn  correspond  to  U  S2  and  fl  S2 
respectively. 

\/d  e  D  .  Rj(d )  =  R\(d)  V  R2(d) 

Vd  e  D  .  Rn(d)  =  Ri(d)  A  i?2(rf) 


5.  Recall  that  the  Injection  Concrete  maps  atomic  propositions  to  expressions. 
Let  p  be  any  proposition.  Then  the  following  representation  R  corresponds  to 
the  set  of  states  labeled  with  p:  Vd  e  D  .  R(d)  =  Concrete(p). 


□ 

In  the  rest  of  this  thesis  we  will  manipulate  sets  of  concrete  states  using  their 
representations.  In  particular,  we  will  use  the  following  two  functions  for  restriction 
with  respect  to  a  set  of  atomic  propositions  and  pre-image  computation  with  respect 
to  an  action. 

3.4.1  Restriction 

The  function  Restrict  :  £  x  2AP  — »  £  restricts  an  expressible  set  of  states  with  respect 
to  a  propositional  labeling.  Intuitively,  Restrict(S ,  P)  contains  every  state  in  S  with 
propositional  labeling  P.  Formally: 

Re  strict  (S ,  P)  =  {s  G  S  \  L[c](s)  =  P} 
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3.4.2  Pre-image 


The  function  Preimage  :  £  x  E  — »  S  maps  an  expressible  set  of  states  to  its  pre-image 
under  a  particular  action.  Intuitively,  PreImage(S,  a )  contains  every  state  which  has 
an  a-successor  in  S.  Formally: 

PreImage(S,a )  =  {s6  5[c]  |  Succ^c]  (s,  a)  fl  5  7^  0} 

We  note  that  dne  to  Theorem  4  both  Restrict  and  Preimage  are  effectively 
computable  in  the  sense  that  if  a  representation  corresponding  to  the  argument 
S  is  available,  then  we  can  also  compute  a  representation  corresponding  to  the 
final  result.  Pre-image  computation  is  possible  since  we  are  only  concerned  with 
assignments,  if-then-else’s  etc.  and  not,  for  example,  with  while  statements. 
Similar  approaches  for  representing  and  manipulating  sets  of  states  symbolically  have 
been  used  previously  by,  among  others,  Clarke  [31]  and  Cook  [43].  Finally,  emptiness 
and  universality  of  an  expressible  set  of  states  S  can  be  checked  using  a  theorem  prover 
if,  once  again,  we  have  a  representation  corresponding  to  S.  Of  course,  emptiness  and 
universality  are  undecidablc  in  general. 

3.5  Program 

A  program  consists  of  a  set  of  components.  The  execution  of  a  program  involves  the 
concurrent  execution  of  its  constituent  components. 

Definition  11  (Program)  A  program.  V  is  a  finite  sequence  (Ci, . . .  ,Cn)  where  each 
Cj  is  a  component. 

Naturally,  a  context  for  V  must  consist  of  a  sequence  of  component  contexts,  one 
for  each  Ct.  In  addition,  the  silent  actions  of  each  of  these  component  contexts  must 
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be  different.  This  prohibits  the  components  from  synchronizing  with  each  other  on 
their  silent  actions  during  execution. 

Definition  12  (Program  Context)  A  context  T  for  a  program.  V  =  (C\, . . . ,  Cn)  is 
a  sequence  of  component  contexts  (71, . . .  ,  7n)  such  that:  (i)  Vi  G  (1, . . . ,  n}  .  7 \  is 
a  context  for  component  Ci,  and  (ii)  Vi  G  (1, . . . ,  n}  .  Vj  G  (1, . . .  ,n}  .  i  ^  j  ==>• 
Silent 7i  7^  Silent1:j. 

The  semantics  of  a  program  is  obtained  by  the  parallel  composition  of  the 
semantics  of  its  components. 

Definition  13  (Program  Semantics)  Let  V  =  (Ci, . . .  ,Cn)  be  a  program  and 
T  =  (71, . . .  ,7 n)  be  a  context  for  V.  The  semantics  of  V  under  T,  denoted  by  [Pip, 
is  an  LKS  defined  as  follows: 

Mr  =  [Cil,,  II  ■  ■  ■  II  [C„],„ 
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Chapter  4 
Abstraction 

In  this  chapter  we  will  present  our  abstraction  scheme  used  for  obtaining  finite  LKS 
models  from  C  programs.  An  LKS  M  is  said  to  be  an  abstraction  of  an  LKS  M  if 
M  =4  M.  Therefore,  for  our  purposes,  the  concepts  of  abstraction  and  simulation 
are  synonymous.  First,  we  will  present  a  general  notion  of  abstraction  based  on 
abstraction  mappings.  Later  on  we  will  describe  a  specific  abstraction  technique 
called  predicate  abstraction  [63]  which  we  actually  employ  for  model  construction. 
This  abstraction  technique  has  also  been  used  by  others  for  the  verification  of  both 
hardware  [33]  and  software  [6,  66]  systems. 

4.1  Abstraction  mapping 

Let  M  and  M  be  two  LKSs.  A  function  7i  :  Sm  — ►  is  said  to  be  an  abstraction 
mapping  iff  it  obeys  the  following  conditions: 

•  Vs  e  SM .  LM(s)  =  L^(H(s)) 

•  V  (s,  a,  s')  E  Tm  .(H(s),a,H{s'))  e 
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Vs  G  InitM  ■  7i(s)  G  I  nit  1 2 


The  following  well-known  result  [36]  captures  the  relationship  between  abstraction 
mappings  and  abstractions.  Abstractions  obtained  via  abstraction  mappings, 
particularly  in  the  context  of  hardware  verification,  are  often  referred  to  in  the 
literature  as  existential  abstractions. 

Theorem  5  Let  M  and  M  be  two  LKSs  such  that:  (i)  AP  m  =  dPj,  (ii)  'Em  =  E^, 
and  (in)  there  exists  an  abstraction  mapping  hi  :  Sm  — ►  Then  M  =4  M. 

Proof.  Define  the  relation  7Z  —  {(s,  H(s))  \  s  G  Sm}-  Prove  that  (i)  7Z  is  a  simulation 
relation,  and  (ii)  Vs  G  InitM  ■  3s  G  I  nit ^  .  sTZs 

□ 


4.2  Predicate 

A  predicate  is  simply  a  C  expression.  Recall  that  any  C  expression  can  also  be  viewed 
as  a  formula  (cf.  Section  3.1.2).  Hence  our  use  of  expressions  as  predicates  is  perfectly 
natural.  Let  us  denote  the  set  of  Boolean  values  {true,  false}  by  B.  Let  Pred  be 
a  set  of  predicates.  Recall  that  AP  denotes  the  set  of  atomic  propositions  and  that 
the  Injection  Concrete  maps  atomic  propositions  to  expressions.  Let  us  denote  the 
inverse  of  Concrete  by  Prop.  In  other  words,  Prop  :  Expr  — »  AP  is  a  Injection 
defined  as  follows: 

Ve  G  Expr  .  Prop(e)  =  p  G  AP  C oncrete(p)  =  e 

We  extend  the  function  Prop  to  operate  over  sets  of  expressions  in  the  natural  manner. 

Prop(Pred)  =  { Prop{e )  \  e  G  Pred} 
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Prop(Pred)  can  be  thought  of  as  the  set  of  propositions  obtained  from  the  set 
of  predicates  Pred,  by  replacing  each  predicate  in  Pred  with  its  corresponding 
proposition. 

Let  AP  =  {pi,~.-,Pk}  be  a  set  of  propositions.  Since  each  element  of  AP 
can  be  assigned  a  Boolean  value,  a  valuation  of  AP  is  simply  a  mapping  from 
AP  to  B.  The  set  of  all  valuations  of  AP  is  denoted  by  PropVal(AP).  Given  a 
proposition  p  and  a  Boolean  value  b,  let  us  define  pb  as  follows:  if  b  =  true  then 
pb  =  C oncrete(p )  else  ph  =  !  C oncrete(p) .  Then  the  induced  concretization  function 
Concrete  :  PropVal(AP)  — >  Expr  is  defined  as  follows: 

Concrete(V )  =  Pi (Pl^  &&  •  •  •  &&  Pk^ 

where  V(p)  denotes  the  value  of  proposition  p  according  to  the  valuation  V.  As  a 
special  case,  if  AP  =  0,  then  it  has  just  one  valuation  which  we  denote  by  _L  and  we 
adopt  the  convention  that  Concrete(-L)  =  1.  Recall  that  1  represents  true  as  per 
the  C  expression  semantics.  This  means  that  an  empty  valuation  always  concretizes 
to  TRUE. 

The  key  idea  behind  predicate  abstraction  is  that  sets  of  concrete  stores 
are  abstractly  represented  by  propositional  valuations.  In  particular,  a  valuation  V 
abstractly  represents  the  set  of  stores  a  which  satisfy  the  concretization  Concretely ) 
of  V.  Moreover,  just  as  there  is  a  notion  of  satisfaction  b  of  expressions  by  stores, 
there  is  a  notion  of  abstract  satisfaction  of  expressions  by  valuations.  Intuitively, 
a  valuation  V  abstractly  satisfies  an  expression  e  iff  there  is  a  store  a  such  that 
V  abstractly  represents  o  and  a  b  e.  We  call  this  notion  of  abstract  satisfaction 
admissibility  and  present  it  formally  next. 

Definition  14  (Admissibility)  Let  AP  be  a  set  of  propositions,  V  €  PropVal(AP) 
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be  a  valuation  of  AP  and  e  G  Expr  be  an  expression.  Recall  that  the  C  expression 
ConcreteiV)  denotes  the  concretization  of  the  valuation  V .  Then  V  is  said  to  be 
admissible  with  e,  denoted  by  V  lb  e,  iff  the  following  condition  holds: 

3 a  G  Store  .  a  b  C oncrete(V )  &&  e 

If  V'  G  PropVal(AP')  is  a  valuation  of  another  set  of  propositions  AP' ,  then  we 
write  V  lb  V'  to  mean  V  lb  C oncrete(V') . 


Admissibility  and  Satisfaction.  We  have  intentionally  used  a  symbol  for 
denoting  admissibility  (lb)  which  is  similar  to  that  used  to  denote  satisfaction  (  b  ) 
of  an  expression  by  a  store.  Our  intention  is  to  highlight  the  fact  that  admissibility 
is  essentially  a  form  of  consistency  between  a  valuation  and  an  expression  or  between 
two  valuations.  At  an  abstract  level,  admissibility  plays  a  role  similar  to  that  of 
satisfaction  or  logical  entailment.  This  correspondence  is  further  highlighted  by 
the  similarity  between  the  description  of  the  concrete  and  abstract  semantics  of  a 
component  presented  in  Chapter  3  and  Chapter  4  respectively. 

Checking  Admissibility  in  Practice.  In  order  to  perform  predicate  abstraction 
it  will  be  essential  to  perform  several  admissibility  checks.  Suppose  we  wish  to 
check  admissibility  between  a  valuation  V  and  an  expression  e  (or  between  two 
valuations  V  and  V').  This  boils  down  to  checking  the  satisfiability  of  the  expression 
Concrete(V )  &&  e  (or  Concretely)  &&  ConcreteiV')).  We  will  use  a  theorem  prover 
to  discharge  this  satisfiability  check.  However  the  problem  is  undecidable  in  general 
and  hence  the  theorem  prover  might  time-out  with  an  “I  don't  know".  In  such 
inconclusive  cases,  to  guarantee  the  soundness  of  our  predicate  abstraction  we  must 
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make  a  conservative  decision,  i.e.,  assume  that  V  and  e  (or  V  and  V')  are  indeed 
admissible. 

Definition  15  (Weakest  Precondition)  Given  an  expression  e,  and  an 
assignment  a  of  the  form  Ihs  -  rhs,  the  weakest  precondition  [54,  70]  of  e 
with  respect  to  a,  denoted  by  WPla^e),  is  the  expression  obtained  from  e  by 
simultaneously  replacing  each  occurrence  of  Ihs  with  rhs. 


4.3  Predicate  Mapping 

In  order  to  do  predicate  abstraction,  we  need  to  know  the  set  of  predicates  associated 
with  each  statement  of  a  component.  This  association  is  captured  by  a  predicate 
mapping.  Let  C  be  a  component  and  7  be  a  context  for  C.  Recall  that  AP 7  is  the 
set  of  atomic  propositions  specified  by  7.  Then  a  predicate  mapping  is  a  function 
from  the  statements  of  C  to  sets  of  predicates  such  that  we  have  sufficient  predicates 
to  determine  whether  an  atomic  proposition  p  G  TP7  holds  or  does  not  hold  at 
any  abstract  state.  Recall  that  C oncrete(p)  denotes  the  expression  associated  with 
any  atomic  proposition  p.  Therefore,  for  any  p  G  TP7,  and  for  any  statement  s,  a 
predicate  mapping  must  associate  the  expression  Concrete(p)  as  a  predicate  at  s.  We 
now  give  a  more  formal  definition. 

Definition  16  (Predicate  Mapping)  Let  C  be  a  component  and  7  be  a  context  for 
C.  Recall  that  Stmtc  denotes  the  set  of  statements  of  C,  AP 7  denotes  the  set  of  atomic 
propositions  specified  by  7,  and  Concrete  is  a  mapping  from  the  atomic  propositions 
to  expressions.  A  function  II  :  Stmtc  — >  2Expr  is  said  to  be  a  predicate  mapping  for  C 
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compatible  with  7  iff  the  following  condition  holds: 


Vs  G  Stmtc  ■  Vp  G  AP1 .  C oncrete(p)  G  II(s) 

In  other  words,  II  is  compatible  with  7  iff  the  set  of  predicates  associated  by 
II  with  every  statement  s  of  C  contains  the  predicates  corresponding  to  the  atomic 
propositions  specified  by  the  context  7.  The  predicate  mapping  II  will  be  fixed  in  the 
rest  of  this  chapter.  Hence,  for  any  statement  s  we  will  simply  write  PropVal(s)  to 
mean  PropVal(Prop(U(s)))  where  Prop(n(s))  denotes  the  set  of  atomic  propositions 
corresponding  to  the  set  of  predicates  associated  with  s  by  II. 

4.4  Predicate  Abstraction 

We  are  now  ready  to  present  predicate  abstraction  formally.  We  first  present  predicate 
abstraction  for  a  component  and  then  extend  it  to  a  program.  The  reader  is  advised 
to  perform  a  comparative  study  of  the  remainder  of  this  section  and  Section  3.2.1. 
This  will  enable  him  to  get  a  clearer  picture  of  the  relationship  between  component 
semantics  and  predicate  abstraction,  and  grasp  at  an  intuitive  level  the  significance 
and  correctness  of  Theorem  6. 

Let  C  be  a  component,  7  =  (InitCond,  AP ,£,  Silent,  FSM)  be  any  context 
for  C  and  n  be  a  predicate  mapping  for  C  compatible  with  7.  Then  the  predicate 
abstraction  of  C  under  7  and  with  respect  to  n  is  denoted  by  |CJ^.  In  the  rest  of 
this  section  the  context  7  and  the  predicate  mapping  n  will  be  fixed  and  therefore 
we  will  often  omit  them  when  they  are  obvious.  For  example,  we  will  write  |CJ  to 
mean  |CJ^.  Formally,  |CJ  is  an  LKS  such  that: 

•  fCJ  has  two  kinds  of  states  -  normal  and  inlined.  A  normal  state  is  simply  a 
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pair  consisting  of  a  statement  of  C  and  a  valuation. 

gnormai  =  {(s>  y)  |  s  e  stmtc  A  V  G  PropVal(s)} 

where  PropVal(s)  denotes  the  set  of  valuations  of  the  propositions 
corresponding  to  the  set  of  predicates  associated  with  the  statement  s.  An 
inlined  state  is  obtained  by  inlining  EFSMs  at  corresponding  call  statements. 
Recall  that  for  any  call  statement  s,  FSM(s )  denotes  the  EFSM  corresponding 
to  s.  Therefore,  an  inlined  state  is  simply  a  triple  (s,  V,  i)  where  s  is  a  call 
statement,  V  is  a  valuation  and  i  is  a  state  of  FSM(s).  More  formally,  the  set 
of  inlined  states  Snnhned  is  defined  as: 

Sinlined  =  {(s,  V,  t)  |  s  e  Cal  1(C)  Abe  PropVal(s)  A  t  G  SFSM(s)} 

where  PropVal(s)  denotes  the  set  of  valuations  of  the  propositions 
corresponding  to  the  set  of  predicates  associated  with  the  statement  s,  and 
SF  denotes  the  set  of  states  of  an  EFSM  F.  Finally,  a  state  of  |C|  is  either  a 
normal  state  or  an  inlined  state. 


%J 


g normal 


U  g inlined 


•  An  initial  state  of  |CJ  corresponds  to  the  entry  statement  entry c  of  C  and  a 
valuation  that  is  admissible  with  the  initial  condition  InitCond  specified  by  the 
context  7. 

I  nit  [c|  =  {( entryc,V )  \  V  lb  InitCond} 

•  Recall  that  AP  is  the  set  of  atomic  propositions  specified  by  the  context  7.  The 
atomic  propositions  of  fCJ  are  the  same  as  those  specified  by  7. 

AP¥l  =  Ap 
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•  The  labeling  function  L|cj  of  fCJ  is  consistent  with  the  valuations.  Since  the 
propositional  labeling  does  not  depend  on  the  inlined  EFSM  state,  its  definition 
will  be  identical  for  normal  and  inlined  states.  More  formally: 

he  1  (s>  V)  =  hei  (s>  V,  0  =  {p  E  AP  \  V(p)  =  true} 

Note  that  in  the  above  definitions,  the  value  V (p)  is  always  well-defined  because 
the  predicate  mapping  II  is  compatible  with  the  context  7.  This  ensures  that 
for  any  atomic  proposition  p,  and  any  statement  s  of  the  component  C ,  the 
concretization  C(p)  of  p  always  belongs  to  the  predicate  set  II(s)  associated 
with  the  statement  s  by  the  predicate  mapping  II. 

•  The  alphabet  of  |C|  contains  the  specified  observable  and  silent  actions.  Recall 
that  E  is  the  set  of  observable  actions  associated  with  the  context  7  and  Silent 
is  the  silent  action  associated  with  the  context  7. 

Epj  =  E  U  {Silent} 

Note  once  again  that  r  qL  Epj  and  that  this  fact  will  be  used  for  the 
compositional  verification  technique  presented  later  in  this  thesis. 

•  The  transition  relation  of  the  predicate  abstraction  |CJ  is  defined  in  the 
following  section. 

4.4.1  Transition  Relation  of  |CJ 

In  the  rest  of  this  section  we  will  write  Type ,  Then ,  Else ,  Cond,  LHS  and  RHS  to 
mean  Typec,  Thenc,  Elsec,  Condc,  LHSc  and  RHSc  respectively.  We  will  describe 
outgoing  transitions  from  normal  and  inlined  states  separately. 
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Normal  States.  Let  (s,  V)  be  a  normal  state  of  |CJ.  Recall  that  s  G  Stmtc  is  a 
statement  of  C  and  V  G  PropVal(s)  is  a  valuation  of  the  propositions  corresponding 
to  the  set  of  predicates  associated  with  s.  We  consider  each  possible  value  of  T ype  ( s ) 
separately. 

•  Type(s)  =  EXIT.  In  this  case  (s,  V )  has  no  outgoing  transitions. 

•  Type(s)  =  BRAN.  Recall  that  Cond(s)  is  the  branch  condition  associated 
with  s  while  Then(s)  and  Else(s)  are  the  then  and  else  successors  of  s.  In 
this  case  (s,  V)  performs  the  Silent  action  and  moves  to  Then(s)  or  Else(s) 
depending  on  the  satisfaction  of  the  branch  condition.  The  new  valuation 
must  be  admissible  with  the  old  one.  Let  V'then  G  PropVal(Then(s ))  and 
V'dse  G  PropVal(Else(s)).  Then: 

v  II-  Cond(s)  A  V  lb  v;htn  =s-  (s,  V )  s^3‘  ( Then(s), 

V  II-  !  Cond(s)  A  V  Ih  Kbe  (»,  V)  — ‘  (Else(s),  V'cke) 

•  Type(s)  =  ASGN.  In  this  case  (s,  V)  performs  the  Silent  action  and  moves  to 
the  then  successor  while  the  valuation  is  updated  as  per  the  assignment.  Recall 
that  LHS  ( s )  and  RHS  (s)  are  the  left  and  right  hand  side  expressions  associated 
with  s.  Formally,  let  V'  G  PropVal(Then(s))  and  e  be  the  expression 
WV[LHS{s)=  RHS(s)] {Concretely')).  Then: 

V  Ih  e  (s,V)  S ^  (Then(s),V') 

•  Type(s)  =  CALL.  In  this  case  (s,  V)  performs  the  Silent  action  and  moves  to  an 
initial  state  of  the  EFSM  FSM(s)  corresponding  to  s.  The  valuation  remains 
unchanged.  Recall  that  InitpsM{s )  denotes  the  set  of  initial  states  of  FSM(s). 
Then: 

Vt  G  InitFSM(s)  ■  (s,  V)  (5,  V,  l) 
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Inlined  States.  Let  s  G  Call(C),  V  G  PropVal(s)  and  (s,  V,  i)  be  an  inlined  state. 
Recall  that  in  this  case  i  must  be  a  state  of  the  EFSM  FSM  (s)  corresponding  to  s. 
Also  recall  that  the  transitions  of  FSM  (s)  are  labeled  with  guarded  commands  of 
the  form  g/a  or  g/l  :=  r.  We  consider  four  possible  types  of  outgoing  transitions  of 
FSM (s)  from  the  state  i. 

•  l  t'  and  l'  7^  STOP.  If  the  valuation  V  is  admissible  with  the  guard  g, 
(s,  V,  i)  performs  action  a  and  moves  to  the  inlined  state  corresponding  to  i' . 
The  valuation  remains  unchanged.  Formally: 

VU-g  => 

•  l  ^  i'  and  i'  =  STOP.  If  the  valuation  V  is  admissible  with  the  guard 
g,  (s,  V,  i)  performs  action  a  and  returns  from  the  library  routine  call.  The 
new  valuation  must  be  admissible  with  the  old  one.  Formally,  let  V'  G 
PropVal(Then(s)).  Then: 

V  IP  g  A  V  IP  V'  =>  (s,  V,  l)  (Then(s),  V ') 

•  l  g~ — >  i'  and  i'  7^  STOP.  If  the  valuation  V  is  admissible  with  the  guard 
g,  ( s ,  V,  t)  performs  Silent  and  moves  to  the  inlined  state  corresponding  to 
l' .  The  valuation  is  updated  as  per  the  assignment  l  :=  r.  Formally,  let 
V'  G  PropVal(s)  and  e  be  the  expression  WV[l  :=  r]{Concrete(V')).  Then: 

V  IP  g  A  V  IP  V'  =►  (s,  V,  i)  (s,  V,  l') 

•  l  l'  and  l'  =  STOP.  If  the  valuation  V  is  admissible  with  the  guard  g, 

(s,  V,  i)  performs  Silent  and  returns  from  the  library  routine  call.  The  valuation 
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is  updated  as  per  the  assignment  l  :=  r.  Formally,  let  V'  G  PropVal(Then(s )) 
and  e  be  the  expression  WV[l  :  =  r](Concrete( V')).  Then: 


Figure  4.1:  The  component  from  Example  9. 

Example  10  Recall  the  component  C  and  the  context  7  from  Figure  f.l  and 
Example  9.  The  key  thing  to  remember  is  that  according  to  7,  the  library  routines 
alpha ,  beta,  chi  and  delta  perform  the  actions  a,  f3,  x  and  8  respectively  and  terminate. 
Let  II  be  the  predicate  mapping  of  C  which  maps  every  statement  of  C  to  It).  Hence  the 
set  of  valuations  for  the  propositions  corresponding  to  the  set  of  predicates  associated 
with  each  statement  of  C  is  simply  {T}. 

Figure  f.2  shows  the  reachable  states  of  the  LKS  |C|^.  Since  the  valuations  are 
always  _L  we  omit  them  for  simplicity.  Thus  the  normal  states  are  only  labeled  by  the 
associated  component  statement.  The  inlined  states  are  also  labeled  by  the  state  of  the 
EFSMs  associated  with  their  corresponding  library  routine  calls.  In  particular  these 
are  the  initial  states  of  the  EFSMs  Fa,  Fx,  Fp  and  F$  shown  in  Figure  3.4,  and  are 
denoted  by  Inita,  Initp ,  Initx  and  Inits  respectively. 
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Figure  4.2:  The  LKS  obtained  by  predicate  abstraction  of  the  component  in  Figure  4.1. 

The  fact  that  a  predicate  abstraction  is  indeed  an  abstraction  is  captured  by  the 
following  theorem. 

Theorem  6  Let  C  be  a  component,  7  be  a  context  for  C  and  II  :  Stmtc  — >  2Expr  be  a 
predicate  mapping  for  C  compatible  with  7.  Then  [C]7  ^  fC]1,1. 

Proof.  The  key  idea  is  to  define  an  abstraction  mapping  from  the  concrete  states 
of  [C]7  to  the  abstract  states  of  |CJ7 .  The  mapping  must  reflect  the  fact  that 
abstract  states  are  obtained  from  concrete  states  by  representing  stores  abstractly 
using  valuations.  In  other  words,  the  correspondence  between  concrete  and  abstract 
states  must  be  captured  by  the  correspondence  between  the  stores  and  the  valuations. 

This  is  achieved  by  the  function  TL  :  — >  £jjC|ii.  Intuitively,  TL  maps  a  concrete 

state  s  to  an  abstract  state 's  iff  the  store  associated  with  s  satisfies  the  concretization 
of  the  valuation  associated  with  s'.  Formally,  Tt  is  defined  as  follows: 

TL  (s,  a)  —  (s,  V) .  a  N  C oncrete(V) 

TL  (s,  a,  l)  =  (s,  V,  i)  .  a  N  C oncreteiV) 
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We  can  prove  that  TL  is  an  abstraction  mapping.  Then  the  result  follows  from 
Theorem  5. 


□ 

The  fact  that  simulation  is  a  congruence  with  respect  to  parallel  composition 
means  that  predicate  abstraction  can  be  performed  on  a  program  component-wise. 
To  present  this  idea  formally  we  need  to  extend  the  notion  of  predicate  mapping  and 
predicate  abstraction  to  programs. 

Definition  17  (Program  Predicate  Mapping)  Let  V  =  (Ci,...,Cn)  be  a 

program  and  T  =  (7x,...,7n)  be  a  context  for  V.  A  predicate  mapping  for  V 
compatible  with  T  is  a  sequence  (n1?,  ..,IIn)  such  that  Ifi  is  a  predicate  mapping 
for  Cj  compatible  with  7 \  for  i  e  (1, . . . ,  n}. 

Definition  18  (Program  Predicate  Abstraction)  Let  V  =  (Ci,..,,Cn)  be  a 
program,  T  =  (yx, . . . ,  yn)  be  a  context  for  V  and  II  =  (fix, . . . ,  fln)  be  a  predicate 
mapping  for  V  compatible  with  T.  Then  the  predicate  abstraction  of  V  under  T  and 
with  respect  to  TT,  denoted  by  fPj}7,  is  the  LKS  defined  as  follows: 

pi?  =  10C 11  ■■■  11 

Theorem  7  Let  V  be  a  program,  T  be  a  context  for  V  and  U  be  a  predicate  mapping 
forV  compatible  with  T.  Then  the  following  holds: 

OTr  =*  M? 

Proof.  Immediately  from  Theorem  3  and  Theorem  6. 


□ 
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4.5  Predicate  Inference 


Recall  that  the  predicate  abstraction  of  a  component  C  is  parameterized  by  a 
predicate  mapping  II  for  C.  In  the  framework  being  presented  in  this  thesis,  II  is 
constructed  on  the  basis  of  a  set  of  seed  branches  of  C  via  a  process  of  predicate 
inference.  Let  us  denote  by  Be  the  set  of  branch  statements  of  C.  Formally, 
Bc  =  (s  G  Stmtc  |  T(s)  =  BRAN}.  Further,  for  any  EFSM  F,  let  Guard(F)  denote 
the  set  of  guards  associated  with  the  transitions  of  F. 


Procedure  4.1  Predlnfer  computes  a  predicate  mapping  for  component  C  that  is 
compatible  with  a  context  7  using  a  set  of  seed  branches.  It  continues  as  long  as  some  condition 
CONTINUE  holds. _ 

Algorithm  PredInfer(C,  7  ,B) 

let  C  =  (Stmt,  Type,  entry ,  Cond,  LHS ,  RHS,  Then,  Else ); 

let  7  =  (InitCond,  AP ,  E,  Silent,  FSM ); 

for  each  s  G  Stmt  let  II [s]  :=  { Concrete(p )  \  p  G  TP7}; 

for  each  s  G  B  let  IT [.s]  :=  II [s]  U  { Condfs )}; 

for  each  s  G  Call(C )  let  II [s]  :=  II [s]  U  Guard(FSM1(s))\ 

while  (continue)  do 

for  each  s  G  Stmt  case  Type(s)  of 
ASGN  :  let  a  =  LHS(s)  :=  RHS(s ); 

n [s]  :=  II[s]  U  {WP[a](p)  |  p  G  II [Then(s)]}; 

CALL  :  U[s]  :=  II [s]  U  H[Then(s)]; 

BRAN  :  II[s]  :=  II [s]  U  U[T hen(s)]  U  II[E/se(s)]; 
return  II; 


Algorithm  Predlnfer,  presented  in  Procedure  4.1,  takes  as  input  a  component  C, 
a  context  7  for  C  and  a  set  of  branches  B  C  Be-  It  computes  and  returns  a  predicate 
mapping  for  C  compatible  with  7.  Essentially,  Predlnfer  works  as  follows.  First  it 
initializes  IT  with  the  expressions  corresponding  to  propositions.  This  is  crucial  to 
ensure  that  the  final  result  is  compatible  with  7.  Then  Predlnfer  seeds  II  using  the 
branch  conditions  of  the  seed  branches  B  and  guards  from  the  EFSMs  corresponding 
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Figure  4.3:  The  component  of  Figure  4.1  with  each  statement  labeled  by  inferred  predicates 
computed  by  Predlnfer  from  the  set  of  seed  branches  {3,4}. 

to  the  call  statements  of  C.  Finally  it  iteratively  adds  new  predicates  to  the  statements 
of  C  on  the  basis  of  the  predicates  that  have  been  already  inferred  at  their  successor 
statements. 

Intuitively,  whenever  a  predicate  p  is  inferred  at  a  statement  s,  the  weakest 

precondition  of  p  is  inferred  at  every  predecessor  statement  of  s.  If  s  is  an 

assignment  statement,  the  weakest  precondition  of  p  is  computed  in  the  obvious 
manner.  Otherwise  the  weakest  precondition  of  p  is  p  itself.  This  iterative  procedure 
might  not  terminate  in  general  but  can  be  terminated  on  the  basis  of  some  criterion 
(represented  in  Procedure  4.1  as  continue)  such  as  the  size  of  II. 

Example  11  Recall  the  component  C  from  Figure  f.l  and  the  context  7  from 

Example  10.  Let  B  be  the  set  of  branch  statements  {3,4},  i.e.,  the  two  branch 

statements  of  C  with  conditions  x  and  y  respectively.  Then  Figure  f.3  shows  C  with 
each  statement  labeled  by  the  set  of  predicates  inferred  by  Predlnfer  when  invoked 
with  arguments  C,  7  and  B . 
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Chapter  5 


Simulation 


As  mentioned  before,  in  our  approach,  verification  amounts  to  checking  that 
the  specification  LKS  simulates  the  implementation  LKS.  Therefore,  we  consider 
simulation  in  more  detail.  In  this  chapter  we  will  write  Im  to  denote  the 
implementation,  Sp  to  denote  the  specification,  E  to  denote  E sp  and  AP  to  denote 
AP sp-  Recall  that  in  general  the  implementation  LKS  will  be  obtained  by  predicate 
abstraction  of  a  program.  In  other  words,  Im  =  fP]]  j1  (cf.  Definition  18)  where  T 
is  a  program  context  (cf.  Definition  12)  and  II  is  a  program  predicate  mapping  (cf. 
Definition  17).  Also  recall  that  E jm  =  E $p  and  AP jm  =  AP sp-  Further,  we  will 
assume  that  Im  and  Sp  have  a  single  initial  state  each,  i.e.,  \Initjm\  =  \Initsp\  =  1. 
The  extension  to  multiple  initial  states  is  straightforward.  Finally,  we  will  write 
Initjm  and  Initsp  to  denote  the  initial  state  of  Im  and  Sp  respectively. 
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5.1  Simulation  Games 


Consider  an  implementation  Im  =  (Sjm,  Initjm ,  APjm,  Ljm,  E jm,  Tj,m)  and  a 
specification  Sp  =  ( Ssp ,  Initsp,  APsp,  Lsp,  E sp,  Tsp )  such  that  E/m  =  Esp  and 
AP Im  =  AP Sp.  Suppose  we  want  to  determine  whether  Im  =4  Sp.  It  is 
well-known  [110]  that  this  can  be  verified  using  a  two-player  game  between  the 
implementation  Im  and  the  specification  Sp.  In  each  round  of  the  game,  the 
implementation  poses  a  challenge  and  the  specification  attempts  to  provide  a  response. 
Each  player  has  one  pebble  located  on  some  state  of  his  LKS  which  he  can  move  along 
transitions  of  his  LKS.  The  location  of  the  pebbles  at  the  start  of  each  round  is  called 
a  game  state,  or  position,  and  is  denoted  by  ( sim,ssP )  where  sjm  and  ssP  are  the 
locations  of  the  implementation’s  and  specification’s  pebbles  respectively. 

A  game  position  (s/m,  ssP )  is  said  to  be  admissible  iff  the  corresponding 
implementation  and  specification  states  agree  on  the  set  of  atomic  propositions,  i.e., 
Lim(sjm)  =  LSp(ssp )•  We  will  only  consider  admissible  game  positions.  Further  we 
will  assume  that  the  position  ( Initjm ,  Initsp)  is  admissible.  If  ( Initim ,  Initsp )  is  not 
admissible,  Im  ^  Sp  holds  trivially.  From  a  given  admissible  position  ( sjm,ssp ),  the 
game  proceeds  as  follows: 

•  Implementation  Challenge.  The  implementation  picks  an  action  a  and  a 
successor  state  s'Im  G  Succjm(sjm,a )  and  moves  its  pebble  to  s'Im.  We  denote 
such  a  challenge  as  simply  ( sjm,ssp )  — ->  (s'/m,?). 

•  Specification  Response.  Recall  that  Ljm(s'Irn)  denotes  the  propositional 
labeling  of  state  s'Im  .  The  specification  responds  by  moving  its  pebble  to  a  state 
s'Sp  such  that  s'Sp  is  an  cc-successor  of  ssp  and  s'Sp  has  the  same  propositional 
labeling  as  s'Im.  In  other  words,  s'Sp  G  PSuccsp(ssp, «,  LIm(s'Im)).  Thus,  the 


66 


specification  completes  the  challenge  ( sjm,ssp )  — >  (s'/m,  ?)  into  a  transition 

(sIm,sSp)  *  sSp)' 

The  game  continues  into  the  next  round  from  position  ( s'Im ,  s'Sp).  Note  that  the 
response  must  involve  the  same  action  (a)  and  atomic  propositions  (Ljm  (4J) 
as  the  corresponding  challenge.  In  particular,  as  per  the  definition  of  PSucc 
(cf.  Definition  2),  Lsp(s'Sp )  =  Ljm(s'Im)  and  hence  position  (s'Tm,  s'Sp)  is  once 
again  admissible. 

•  Winning  Condition.  The  implementation  wins  iff  the  specification  is  unable 
to  respond  to  some  move  of  the  implementation. 

A  simulation  game  is  completely  defined  by  Im,  Sp  and  the  initial  position.  Let  us 
denote  the  simulation  game  with  (s/m,  ssp )  as  the  initial  position  by  Game(s/m,  ssp )• 
A  position  (s/m,  ssp)  is  called  a  winning  position  iff  Im  has  a  well-defined  strategy  to 
win  Game(s/m,  ssP).  The  relationship  between  simulation  and  simulation  games  is 
well-known  and  is  captured  by  Theorem  8. 

Theorem  8  Im  =4  Sp  iff  the  implementation  Im  does  not  have  a  strategy  to  win 
Gam  e(rnitjm,  Initsp),  i.e.,  if  (Initim,  I  nits  P)  is  not  a  winning  position. 

As  the  implementation  Im  can  only  win  after  a  finite  number  of  moves,  it  is  easy 
to  see  that  every  winning  strategy  for  Im  in  any  simulation  game  can  be  described 
by  a  finite  tree  with  the  following  characteristic.  For  each  position  (s/m,  ssp ),  the  tree 
explains  how  Im  should  pick  a  challenge  ( sjm ,  ssp )  ( s'Im ,  ?)  in  order  to  ultimately 

win.  Each  such  tree  constitutes  a  counterexample  for  the  simulation  relation  and  will 
be  referred  to  as  a  Counterexample  Tree.  In  general,  for  each  game  position,  there 
may  exist  several  ways  for  Im  to  challenge  and  still  win  eventually.  This  element  of 
choice  leads  to  the  existence  of  multiple  Counterexample  Trees. 
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We  will  now  give  a  formal  framework  which  describes  the  game  in  such  a  way 
that  Counterexample  Trees  can  be  easily  extracted.  We  will  write  Pos  to  mean  the 
set  of  all  game  positions,  i.e.,  Pos  =  Sim  x  Ssp ■  Let  Challenge  denote  the  set  of  all 
challenges.  We  begin  by  defining  the  functions  Response  :  Challenge  — »  2Pos  which 
maps  a  challenge  c  to  the  set  of  all  new  game  positions  that  can  result  after  Sp  has 
responded  to  c. 

Response((sIm,sSp )  (s'Im,?))  =  {s'/m}  x  PSuccSp(sSp,  a,  LIm(s'Im)) 


Im  Sp 

Figure  5.1:  Two  simple  LKSs. 

Example  12  Let  Pm  and  Sp  be  the  LKSs  from  Figure  5.1.  From  position  ( 52,  T2 ), 
Im  can  pose  the  following  two  challenges  due  to  two  possible  moves  from  S2  on  action 

b. 

(S2,  T2 )  (S3,  ?)  and  (S2,  T2 )  (54,  ?) 

For  each  of  these  challenges  Sp  can  respond  in  two  ways  due  to  two  possible  moves 
from  T2  on  action  b. 

Response((S2,T2 )  (53,?))  =  {(53,  T4),  (53,  T5)} 

Response((S2,T2 )  (54,?))  =  {(54,  T4),  (54,  T5)} 


68 


5.2  Strategy  Trees  as  Counterexamples 


Formally,  a  Counterexample  Tree  for  Game(s/m,  ssP )  is  given  by  a  labeled  tree 
( N ,  E ,  r,  b't,  C/r)  where: 

•  iV,  the  set  of  nodes,  describes  the  states  of  the  winning  strategy 

•  E  C  N  x  N,  the  set  of  edges,  describes  the  transitions  between  theses  states 

•  r  G  N  is  the  root  of  the  tree 

•  :  N  — ►  Pos  maps  each  tree  node  to  a  game  position 

•  Ch  :  N  — >  Challenge  maps  each  tree  node  n  to  the  challenge  that  Im  must 
pose  from  position  St(n )  in  accordance  with  the  strategy 

Note  that,  for  a  given  node  n,  if  St(n )  =  ( sjm,ssp )  then  Ch(n)  =  (sjm,ssp) 

(sjm,  ?)  for  some  action  a  and  successor  state  s'Jm  G  Succjm(sjm,a).  Also, 
let  Childfn )  denote  the  set  of  children  of  n.  Then  the  Counterexample  Tree 
(iV,  E,  r,  St,  Ch)  has  to  satisfy  the  following  conditions: 

CE1  The  root  of  the  tree  is  mapped  to  the  initial  game  state,  i.e.,  St(r)  = 
(■ InitIm ,  Initsp). 

CE2  The  children  of  a  node  n  cover  Response(Ch(n)),  i.e.,  the  game  positions  to 
which  the  response  of  Sp  can  lead.  In  other  words: 

Response(Ch(n ))  =  {St(s)  \  c  G  Childfn )} 

CE3  The  leaves  of  the  tree  are  mapped  to  victorious  challenges,  i.e.,  challenges  from 
which  the  specification  has  no  response.  In  other  words,  a  leaf  node  l  has  to 
obey  the  following  condition:  Response(Ch(l ))  =  0. 


69 


Example  13  Consider  again  Im  and  Sp  from  Figure  5.1.  Figure  5.2  shows  a 
Counterexample  Tree  for  Game(51,  Tl).  Inside  each  node  n  we  show  the  challenge 
Ch(n). 


Figure  5.2:  Counterexample  Tree  for  a  simulation  game. 


5.3  Checking  Simulation 

In  this  section  we  describe  a  verification  algorithm  that  checks  whether  Im  =4  Sp  and 
computes  a  Counterexample  Tree  if  Im  Sp.  Recall  that  a  CounterexampleTree 
describes  a  winning  strategy  for  the  implementation  Im  to  win  the  the  simulation 
game.  We  will  first  describe  the  algorithm  ComputeWinPos  which  computes  the 
set  of  winning  positions  along  with  their  associated  challenges;  this  data  is  then  used 
to  construct  a  CounterexampleTree. 

The  Algorithm  ComputeWinPos  is  described  in  Procedure  5.1.  It  collects 
the  winning  positions  of  Im  in  the  set  WinPos.  Starting  with  WinPos  =  0,  it 
adds  new  winning  positions  to  WinPos  until  no  more  winning  positions  can  be 
found.  Note  that  in  the  first  iteration  WinPos  =  0,  and  therefore  the  condition 
Response(c )  C  WinPos  amounts  to  Response(c )  =  0.  The  latter  condition  in  turn 
expresses  that  c  is  a  victorious  challenge. 
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Procedure  5.1  ComputeWinPos  computes  the  set  WinPos  of  winning  positions  for  the 
implementation  Im\  the  challenges  are  stored  in  Ch. 

Algorithm  ComputeWinPos(/m,  Sp) 

WinPos ,  Chal  :=  0; 

forever  do 

find  challenge  c  :  =  ( sim,ssp )  — —> >  (s'Iml  ?)  such  that  Response(c)  C  PFinPos 
/  /  all  responses  are  winning  positions 
if  not  found  return  (PFinPos,  Chal)] 

WinPos  :=  PFinPos  U  {(s/m,  S5P)}; 

Chal(sjm,  Ssp)  •  c, 


In  Procedure  5.2  we  present  the  verification  algorithm  SimulCETree  that  works 
as  follows:  it  first  invokes  ComputeWinPos  to  compute  the  set  WinPos  of 
winning  positions.  If  the  initial  position  (Initim,  Initsp)  is  not  in  WinPos,  then 
the  implementation  cannot  win  the  simulation  game  Gam e(Initim,  Ssp)-  In  this 
case,  SimulCETree  declares  that  “Jm  Sp'  (recall  Theorem  8)  and  terminates. 
Otherwise,  it  invokes  algorithm  ComputeStrategy  (presented  in  Procedure  5.3)  to 
compute  a  Counterexample  Tree  for  Gam e(I nit jm,  InitsP) ■ 

Theorem  9  Algorithm  SimulCETree  is  correct. 

Proof.  The  correctness  of  SimulCETree  follows  from  the  fact  that  the  maximal 
simulation  relation  between  Im  and  Sp  is  a  greatest  fixed  point  and  SimulCETree 
effectively  computes  its  complement. 


□ 

Algorithm  ComputeStrategy  takes  the  following  as  inputs:  (i)  a  winning 
position  ( sjm,ssp ),  (ii)  the  set  of  all  winning  positions  WinPos,  and  (iii)  additional 
challenge  information  Chal.  It  constructs  a  Counterexample  Tree  for  the  simulation 
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game  Game(s/m,  ssp)  and  returns  the  root  of  this  Counterexample  Tree.  Note  that 
at  the  top  level.  ComputeStrategy  is  invoked  by  SimulCETree  with  the  winning 
position  Initsp).  This  call  therefore  returns  a  Counterexample  Tree  for  the 

complete  simulation  game  Gam e(Initim,  Initsp). 


Procedure  5.2  SimulCETree  checks  for  simulation,  and  returns  a  Counterexample  Tree 
in  case  of  violation. _ 

Algorithm  SimulCETree(/m,  Sp) 

(  WinPos ,  Chal )  :=  ComputeWinPos(/m,  Sp); 
if  Initsp)  WinPos  return  “im  bp”; 

else  return  ComputeStrategy (Initjm,  Initsp,  WinPos,  Chal)] 


When  ComputeStrategy  is  invoked  with  position  ( Sjm,ssp ),  it  first  creates  a 
root  node  r  and  associates  position  ( sjm,ssp )  and  challenge  Chal(sjm,ssp )  with  r. 
It  then  considers  all  the  positions  reachable  by  responding  to  Chal(sjm,ssp),  he., 
all  the  positions  with  which  the  next  round  of  the  game  might  begin.  For  each  of 
these  positions,  ComputeStrategy  constructs  a  Counterexample  Tree  by  invoking 
itself  recursively.  Finally,  ComputeStrategy  returns  r  as  the  root  of  a  new  tree,  in 
which  the  children  of  r  are  the  roots  of  the  recursively  computed  trees.  Note  that  if 
Response(Chal(sim,  ssP))  =  0,  he.,  if  Chal(sjm,  ssP )  is  a  victorious  challenge,  then  r 
becomes  a  leaf  node  as  expected  from  condition  CE3  above. 

As  described  in  Procedure  5.1,  ComputeWinPos  is  essentially  a  least  fixed 
point  algorithm  for  computing  the  set  of  winning  positions  WinPos  and  additional 
challenge  information  Chal.  In  fact,  ComputeWinPos  can  be  viewed  as  the  dual 
of  the  greatest  fixed  point  algorithm  for  computing  the  maximal  simulation  relation 
between  Im  and  Sp.  Since  fixed  point  computation  is  quite  expensive  in  practice, 
ComputeWinPos  is  quite  naive  and  is  presented  for  its  simplicity  and  ease  of 
understanding.  In  practice,  ComputeWinPos  is  implemented  by:  (i)  reducing  it  to 
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a  satisfiability  problem  for  weakly  negated  HORNSAT  (N-HORNSAT)  formulas  and, 
(ii)  using  an  N-HORNSAT  algorithm  that  not  only  checks  for  satisfiability  but  also 
computes  WinPos  and  Chal.  This  procedure  is  presented  in  detail  in  Section  5.4. 

5.3.1  Computing  Multiple  Counterexample  Trees 

For  given  Im  and  Sp,  the  set  of  winning  positions  WinPos  computed  by 
ComputeWinPos  is  uniquely  defined,  i.e.,  each  position  ( Sim,ssP )  is  either  the 
root  of  some  winning  strategy  (i.e.,  ( sim,ssP )  €  WinPos)  or  not  (i.e,  ( sjm,ssP )  ^ 
WinPos).  There  may,  however,  be  multiple  winning  strategies  from  position 
(^/m,  s Sp) i  simply  because  there  may  be  different  challenges  Im  can  pose,  which  all 
will  ultimately  lead  to  Im' s  victory. 

In  the  algorithm  ComputeWinPos,  this  is  reflected  by  the  fact  that  at  each 
time  when  the  algorithm  selects  a  challenge  c,  there  may  be  several  candidates  for  c, 
and  only  one  of  them  is  stored  in  Ch(sjm,  ssP).  The  challenge  information  stored  in 
Ch  is  subsequently  used  by  ComputeStrategy,  the  algorithm  which  constructs  the 
winning  strategy.  Thus,  depending  on  ComputeWinPos’s  choices  for  the  challenges 
c,  ComputeStrategy  will  output  different  winning  strategies.  While  all  these 
strategies  are  by  construction  winning  strategies  for  Im,  they  may  differ  in  various 
aspects,  for  example,  the  tree  size  or  the  actions  and  states  involved.  In  Section  6.5, 
we  will  see  that  in  our  experiments,  using  a  set  of  different  winning  strategies  instead 
of  one  indeed  helps  to  save  time  and  memory. 
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Procedure  5.3  ComputeStrategy  recursively  computes  a  winning  strategy  for  showing 
that  ( sim,ssp )  £  WinPos ;  it  outputs  the  root  of  the  strategy  tree. 

Algorithm  ComputeStrategy(s/m, ssP,  WinPos,  Chal ); 

//  ( sim,ssP )  is  a  winning  position  in  WinPos 
create  new  tree  node  r  with  St(r )  :=  ( Sjm,ssp )  and  Ch{r)  :=  Chains jm,  Ssp)] 
for  all  ( cjm,csp )  G  Response(Chal(sjm,ssp )) 

create  tree  edge  r  — >  ComputeStrategy(c/m,  csp,  WinPos,  Chal)] 
return  r; 


5.4  Simulation  using  N-HORNSAT 

Given  two  LKSs  Im  and  Sp  we  can  verify  whether  Im  p.  Sp  efficiently  by  reducing 
the  problem  to  an  instance  of  Boolean  satisfiability  [103]  or  SAT.  Interestingly  the 
SAT  instances  produced  by  this  method  always  belong  to  a  restricted  class  of  SAT 
formulas  known  as  the  weakly  negated  HORN  formulas.  The  satisfiability  problem  for 
such  formulas  is  also  known  as  N-HORNSAT.  In  contrast  to  general  SAT  (which  has 
no  known  polynomial  time  algorithm),  N-HORNSAT  can  be  solved  in  linear  time  [57]. 

In  this  section  we  present  the  N-HORNSAT  based  simulation  check  algorithm. 
We  also  describe  a  procedure  to  compute  the  set  of  winning  positions  WinPos 
and  associated  challenge  information  Chal  that  can  be  used  subsequently  by 
ComputeStrategy  to  compute  a  Counterexample  Tree  in  case  the  simulation  is 
found  not  to  exist.  We  begin  with  a  few  preliminary  definitions. 

5.4.1  Definitions 

A  literal  is  either  a  boolean  variable  (in  which  case  it  is  said  to  be  positive)  or  its 
negation  (in  which  case  it  is  said  to  be  negative).  A  clause  is  a  disjunction  of  literals, 
i.e.,  a  formula  of  the  form  (fi  V  •  •  •  V  lm)  where  lt  is  a  literal  for  1  <  i  <  m.  A  formula 
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is  said  to  be  in  conjunctive  normal  form  (CNF)  iff  it  is  a  conjunction  of  clauses,  i.e., 
of  the  form  (ci  A  •  •  •  A  cn)  where  ct  is  a  clause  for  1  <  i  <  n. 

Recall  that  B  denotes  the  set  of  Boolean  values  {true,  false}.  A  valuation  is  a 
function  from  boolean  variables  to  B.  A  valuation  V  automatically  induces  a  function 
V  from  literals  to  B  as  follows:  (i)  V(l)  =  V(b)  if  /  is  of  the  form  b  and  (ii)  V(l)  =  ->V(5) 
if  /  is  of  the  form  ->b.  A  valuation  V  automatically  induces  a  function  V  from  clauses 
to  B  as  follows.  Let  c  =  (h  V  •  •  •  V  lm )  be  a  clause.  Then  V(c)  =  V^=i  In  the 

same  spirit,  a  valuation  V  automatically  induces  a  function  V  from  CNF  formula  to 
B  as  follows.  Let  <f>  =  (c\  A  •  •  •  A  cn)  be  a  CNF  formula.  Then  V(4>)  =  /\”=]  V(q). 
A  CNF  formula  (j)  is  said  to  be  satisfiablc  iff  there  exists  a  valuation  V  such  that 
V(0)  =  TRUE. 

A  CNF  formula  (ci  A  •  •  •  A  cn)  is  said  to  be  a  weakly  negated  HORN  (N-HORN) 
formula  iff  each  c*  contains  at  most  one  negative  literal  for  1  <  i  <  n.  The  problem  of 
checking  the  satisfiability  of  an  arbitrary  N-HORN  formula  is  known  as  N-HORNSAT. 
There  exists  a  well-known  algorithm  [4]  for  solving  the  N-HORNSAT  problem  that 
requires  linear  time  and  space  in  the  size  of  the  input  formula.  We  are  now  ready  to 
present  the  N-HORNSAT  based  simulation  checking  algorithm. 

5.4.2  Reducing  Simulation  to  N-HORNSAT 

Let  Im  and  Sp  be  two  LKSs  such  that  £/m  =  £sp.  Our  goal  is  to  create  an 
N-HORN  formula  <t>(Im,  Sp)  such  that  <f)(Im,  Sp)  is  satisfiablc  iff  Im  =4  Sp.  For 
each  sjm  G  Sjm  and  ssP  G  Ssp  we  introduce  a  boolean  variable  that  we  denote 
WP(sim,  ssP).  Intuitively,  WP(sim,ssp)  stands  for  the  proposition  that  (sjm,ssP)  is 
not  a  winning  position.  We  then  generate  a  set  of  clauses  that  constrain  the  various 
boolean  variables  in  accordance  with  the  rules  of  a  simulation  game. 
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In  particular  suppose  WP(sim,  ssP)  is  true.  Then  (sim,ssp)  is  not  a  winning 
position.  Now  suppose  s jm  s'Im.  Then  according  to  the  rules  of  a  simulation 

game,  there  must  exist  a  successor  game  state  which  is  also  not  a  winning  position. 
In  other  words,  there  must  exist  some  state  s'Sp  such  that:  (i)  sgp  ~—^sp  s>sPi  (ii) 
Lsp(s'sp)  =  LIm(s'Im),  and  (iii)  (s'Im,s's  )  is  not  a  winning  position.  But  this  argument 
can  be  expressed  formally  by  the  following  clause: 

W  P(simi  &Sp)  ''  V  l VP{s'Im,s'Sr) 

s'sP£PSuccSp(sSp,a,Llm(s'Irn)) 

In  essence,  most  of  our  target  formula  ( p(Im ,  Sp)  is  composed  of  such  clauses 
(which  we  shall  call  the  transition  clauses),  one  for  each  appropriate  choice  of 
sim,ssp,a  and  s'Im.  As  a  special  case,  when  PSuccsp(ssp,oi,  LIm(s'Im))  =  0,  the 
generated  clause  is  simply  ->WP(sim,ssp)-  In  addition  to  the  transition  clauses, 
<f)(Im,  Sp)  contains  a  single  clause  WP(Initim,  Initsp)  which  expresses  the  constraint 
that  ( I  nit  i  m ,  Initsp)  is  not  a  winning  position.  Let  us  call  this  clause  the  initial  clause. 
The  algorithm  to  generate  (j)(Im,  Sp)  is  called  GenerateHORN  and  is  shown  in 
Procedure  5.4.  Note  that  the  generated  (j)(Im,  Sp)  is  a  N-HORN  formula. 

Procedure  5.4  GenerateHORN  to  generate  Sp). 

Algorithm  GenerateHORN  (Im,  Sp) 
for  each  s jm  e  Sjm,  for  each  ssp  €  Ssp 

for  each  a  e  £/m,  for  each  s'Im  G  Succim(sim,a) 

output  clause  WP(sIm,SSp)  ==►  Vs'ff|>ePS«ocap(aSpia)L7m(a'm))  WP(s'Im,  s'Sp) 
/  / generate  transition  clause 

output  clause  W P (I nit jm,  Initsp)  // generate  initial  clause 


The  above  method  of  checking  simulation  via  N-HORNSAT  is  well-known  [103]. 
Further,  N-HORNSAT  can  be  solved  in  linear  time  and  space  [57].  This  yields 
extremely  efficient  algorithms  for  checking  simulation  between  two  LKSs.  In  addition, 
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our  CEGAR  framework  requires  a  counterexample  if  the  simulation  check  fails. 
As  part  of  magic  we  have  implemented  an  extended  version  of  the  N-HORNSAT 
algorithm  presented  by  Ausiello  and  Italiano  [4]  to  achieve  precisely  this  goal.  In 
other  words,  not  only  does  our  algorithm  check  for  satisfiability  of  N-HORN  formulas, 
but  it  also  constructs  a  counterexample  for  the  simulation  relation  if  the  formula  is 
found  to  be  unsatisfiablc.  To  the  best  of  my  knowledge,  this  is  the  first  attempt  to 
construct  counterexamples  in  the  context  of  simulation  using  SAT  procedures. 

5.4.3  Computing  WinPos  and  Chal 

Recall  that  in  order  to  check  simulation  between  Im  and  Sp,  we  first  construct  an  N- 
HORNSAT  formula  0(/m,  Sp)  such  that  <f)(Im,  Sp)  is  satisfiable  iff  Im  =4  Sp.  In  this 
section  we  describe  an  algorithm  to  check  for  the  satisfiability  of  0(/m,  Sp).  We  also 
describe  a  procedure  to  compute  the  set  of  winning  positions  WinPos  and  associated 
challenge  information  Chal  that  can  be  used  subsequently  by  ComputeStrategy  to 
compute  a  Counterexample  Tree. 

In  the  rest  of  this  section  we  shall  denote  c j)(Im ,  Sp)  as  simply  <fi.  The  satisfiability 
check  occurs  in  two  phases.  In  the  first  phase,  a  directed  hypergraph,  HQ  is 
constructed  on  the  basis  of  the  clauses  in  Q.  The  nodes  of  HQ  correspond  to  the 
Boolean  variables  in  <fi.  We  shall  denote  the  node  corresponding  to  Boolean  variable 
b  as  simply  J\fb.  Additionally  there  are  two  special  nodes  called  A/true  and  A/false- 
The  edges  of  HQ  are  constructed  as  follows: 

•  For  each  clause  of  the  form  -i b  in  0,  we  add  a  hyper-edge  from  the  hyper-node 
{A/false}  to  node  J\fb. 

•  For  each  clause  of  the  form  (&i  V  •  •  •  V  bk)  in  0,  we  add  a  hyper-edge  from  the 
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hyper-node  {Ay ,  •  •  • , A/y }  to  node  A/true- 


•  Finally,  for  each  clause  of  the  form  (-1  b0  V  b\  V  •  •  •  V  bQ)  in  q b,  we  add  a  hyper-edge 
from  the  hyper- node  {A/y , . . . ,  A/y}  to  node  A/y. 

Essentially  the  edges  of  represent  the  logical  flow  of  falsehood  as  forced  by  the 
clauses  of  0.  Suppose  we  define  the  notion  of  reachability  of  nodes  in  HQ  from  A/false 
as  follows:  (i)  A/false  is  reachable  from  A/false,  and  (ii)  a  hyper-node  {A/y , . . . ,  A/y} 
is  reachable  from  A/false  iff  each  of  the  nodes  A/y ,  •  •  • ,  A/y  is  reachable  from  A/false, 
and  (iii)  a  node  n  is  reachable  from  A/false  iff  there  is  a  hyper-node  h  such  that  h  is 
reachable  from  A/false  and  there  is  a  hyper-edge  from  h  to  n. 

In  the  second  phase  of  our  N-HORNSAT  satisfiability  algorithm,  we  compute  the 
set  of  nodes  of  HQ,  denoted  by  Reach,  that  are  reachable  from  A/false-  Reach  can 
be  computed  using  linear  time  and  space  in  the  size  of  HQ  (and  hence  0).  It  can 
be  shown  that  a  node  ■^’wp(sIrn,sSp)  belongs  to  Reach  iff  in  order  to  satisfy  0  the 
variable  W P(sim,  ssp)  must  be  assigned  false.  As  a  consequence,  0  is  satishable 
iff  A/true  Reach.  In  addition,  Reach  has  the  following  significance.  Recall  that 
the  boolean  variables  in  0  are  of  the  form  W P(sjm,  ssP)-  It  can  be  shown  that  the 
following  holds: 

'rfsim  £  Rim’^sgp  G  Sgp  >  J\f w p(sjm,ssP)  ^  Reach  \  v  ( sim,ssp )  £  WinPos 

In  other  words,  the  elements  in  Reach\ {A/true,  A/false}  are  exactly  those  nodes 
that  correspond  to  boolean  variables  WP(sjm,  ssP)  such  that  (sim,ssp)  is  a  winning 
position.  Therefore,  once  Reach  has  been  computed  it  is  trivial  to  compute  WinPos. 
To  compute  Chal  we  note  the  following.  Suppose  that  a  node  Af- WP(sIrn,sSp )  Se^s  added 
to  Reach  at  some  point.  This  means  that  the  following  two  conditions  must  hold: 
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CHI  A  transition  clause  of  the  following  form  was  added  to  <f>  at  line  3  of 
Procedure  5.4. 

WP(sIm,sSp )  ==>■  V  WP(s'Im,s'Sr) 

s'Sp&PSuccsv{ssP,a,LIrn(s'Im )) 

CH2  Every  node  of  HQ  of  the  form  J\fwp{s'j  ,s's  )  such  that  s'Sp  G 
PSuccsp(ssp,  cc,  Ljm(s'Im ))  must  already  be  contained  in  Reach.  In  other  words 
every  such  (<Sjm,s'5p)  must  be  a  winning  position. 

From  conditions  CHI  and  CH2  above,  it  is  clearly  appropriate  to  set 
Chal(sIm,sSp )  :=  (sIm,  sSp)  (s/m,?).  Therefore,  as  soon  as  Nwp(sIm,sSp)  Sets 
added  to  Reach ,  one  can  compute  the  clause  postulated  by  condition  CHI  above 
and  set  Chal(sim,  ssp)  appropriately  using  this  clause.  Since  this  can  be  done  for 
every  node  added  to  Reach ,  we  can  effectively  compute  the  challenge  information 
Chal  associated  with  every  winning  position  in  WinPos. 


5.5  Witnesses  as  Counterexamples 

Counterexample  Trees  provide  a  natural  notion  of  counterexamples  to  simulation 
conformance.  However,  we  introduce  witness  LKSs  since  they  enable  us  to  prove 
some  key  results  more  easily.  In  the  rest  of  this  thesis  we  will  refer  to  witness 
LKSs  as  Counterexample  Witnesses.  Recall  from  Theorem  2  that  a  Counterexample 
Witness  to  Im  ^  Sp  is  an  LKS  CW  such  that:  (i)  CW  =4  Im  and  (ii)  CW  ^  Sp. 
Fortunately  a  Counterexample  Witness  can  be  obtained  from  a  Counterexample 
Tree  in  a  straightforward  manner  using  the  recursive  algorithm  TreeToWitness, 
presented  in  Procedure  5.5. 

The  inputs  to  TreeToWitness  are  a  Counterexample  Tree  CT,  a  node  n  of  CT, 
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Procedure  5.5  TreeToWitness  computes  a  Counterexample  Witness  corresponding  to  a 
Counterexample  Tree  CT. 

Algorithm  TVeeTo  Witness  (CT,  n,  Im,  s ) 

let  CT  =  (N,  E,  r,  St,  Ch)  and  Ch{n )  =  (sIm,sSp)  (s'/m,?); 

create  state  s'; 

S  :=  T  :=  {s  .X(s')  :=  L,MJi 

for  each  c  G  Child(n ) 

(5",  T,  L')  :=  TreeTo  Witness  (CT,  c,  Jm,  s'); 

S:=  S'U  S";  T  :=  TU  T;  X  :=  X  U  X'; 
return  (S',  T,  X); 


the  implementation  Tm,  and  a  state  s.  TreeToWitness  computes  and  returns  the 
set  of  states,  transitions  and  propositional  labellings  of  the  Counterexample  Witness 
corresponding  to  the  subtree  of  CT  rooted  at  n.  Intuitively  the  state  s  can  be 
viewed  as  the  initial  state  of  the  computed  Counterexample  Witness.  We  note  any 
Counterexample  Witness  is  a  tree  if  we  view  its  states  as  nodes  and  its  transitions  as 
edges. 


Procedure  5.6  SimulWitness  checks  for  simulation,  and  returns  a  Counterexample 
Witness  in  case  of  violation. _ 

Algorithm  SimulWitness(Xm,  Sp) 

if  (SimulCETree(/m,  Sp)  =  “Im  =4  Spv)  return  “Tm  =4  Sp 
else  let  CT  :=  SimulCETree(/m,  Sp)] 

create  state  init;  (S,  T,L )  :=  TreeToWitness (CTprcr,  Tm,  init); 

S  :=  S  U  {init}]  L(init)  :=  X/m(/mt/m); 
return  (S,  {init},  APIm,  X,  E/m,  T); 


Algorithm  SimulWitness,  presented  in  Procedure  5.6,  is  similar  to 
SimulCETree  except  that  it  returns  a  Counterexample  Witness  as  a  counterexample. 
In  fact,  it  first  invokes  SimulCETree.  If  SimulCETree  returns  “Tm  =4  Sp'1 ,  so 
does  SimulWitness.  Otherwise,  it  invokes  TreeToWitness  to  compute  and  return 
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the  Counterexample  Witness  corresponding  to  the  Counterexample  Tree  returned  by 
SimulCETree.  The  following  two  results  prove  the  correctness  of  SimulWitness. 


v 


ImpI  Spec 


Figure  5.3:  An  implementation  Im  and  a  specification  Sp.  Im  is  the  LKS  from  Figure  4.2. 

Example  14  Figure  5.3  once  again  shows  the  LKS  Im  obtained  by  predicate 
abstraction  in  Chapter  4  (cf.  Figure  4-2).  It  also  shows  a  specification  LKS  Sp.  Note 
that  Im  Sp.  Figure  5.4  shows  a  CounterexamplcTree  returned  by  SimulCETree 
when  invoked  with  Im  and  Sp  and  also  the  CounterexampleWitness  obtained  from 
the  CounterexampleTree  by  invoking  TreeToWitness.  For  ease  of  understanding, 
each  state  of  the  CounterexampleWitness  is  labeled  by  the  corresponding  state  of  Im 
which  simulates  it. 

Theorem  10  Let  CW  be  a  Counterexample  Witness  returned  by 
SimulWitness(/m,  Sp).  Then  the  following  holds:  (i)  CW  =4  Cm,  and  (ii) 
CW  4  Sp. 

Proof.  Recall  that  SimulWitness  invokes  TreeToWitness  in  order  to  construct 
the  Counterexample  Witness  CW.  We  begin  by  defining  a  mapping  Tt  :  Sew  — >  Sim, 
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Counterexample  Tree 


Figure  5.4:  A  CounterexampleTree  and  CounterexampleWitness  corresponding  to 
the  simulation  game  between  Im  and  Sp  from  Figure  5.3.  Each  state  of  the 
CounterexampleWitness  is  labeled  by  the  corresponding  state  of  Im  which  simulates  it. 

from  the  states  of  GW  to  the  states  of  the  implementation  Im  as  follows.  Suppose 
TreeToWitness  was  invoked  by  SimulWitness  with  arguments  ( CT,  n,  Im,  s) 
where  CT  is  a  Counterexample  Tree  for  Gam e(im,  Sp).  Let  CT  =  (TV,  E,  r,  St,  Ch ) 
and  St(n )  =  ( sim,ssP )•  Then  H(s)  =  sjm.  It  is  easy  to  see  that  Tt  is  well-defined. 
Further  one  can  show  that  Ti  is  also  an  abstraction  mapping.  This  completes  the 
proof  of  CW  Im. 

To  prove  that  CW  ^  Sp  we  show  how  to  create  a  Counterexample  Tree 
CT'  =  {N' ,E' ,r' ,  St' ,  Ch')  for  Game(CW,  Sp).  This  is  done  on  the  basis  of  the 
Counterexample  Tree  CT  =  (TV,  E,r,  St,  Ch)  for  Gam e(Im,Sp).  Formally,  the 
components  of  CT'  obey  the  following  constraints: 

•  The  nodes,  edges  and  root  of  CT'  are  the  same  as  those  of  CT. 

N'  —  N  E’  =  E  r'  —  r 
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•  The  state  labeling  of  CT'  is  defined  as  follows.  Suppose  TreeToWitness  was 
invoked  with  arguments  (CT,n,  Im,  s).  Recall  that  CT  =  (N,  E,r,  St,  Ch). 
Let  St(n )  =  (sIm,sSp).  Then  St'(n )  =  (s,sSp). 

•  The  challenge  labeling  of  CT'  is  defined  as  follows.  Suppose  TreeToWitness 

was  invoked  by  SimulWitness  with  arguments  (CT,n,  Im,  s).  Recall  that 
CT  =  (N,  E,r,  St,  Ch).  Suppose  Ch(n )  =  (sim,ssp)  — ->  ?)  and  s'  was 

the  new  state  created  during  this  invocation.  Then  Ch\n )  =  (s,  ssv )  — ->  (s',  ?). 

Finally,  we  show  that  CT'  is  a  valid  Counterexample  Tree  for  Game(C'hF,  Sp). 
This  can  be  done  by  showing  that  CT'  satisfies  conditions  CE1— CE3  described  in 
Section  5.2. 


□ 


Theorem  11  Algorithm  SimulWitness  is  correct. 


Proof.  By  Theorem  9  and  Theorem  10. 


□ 
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Chapter  6 


Refinement 


In  this  chapter  we  describe  the  process  of  counterexample  validation  and  abstraction 
refinement  in  the  context  of  simulation.  Once  a  Counterexample  Witness  CW 
has  been  constructed,  we  need  to  perform  two  steps:  (i)  check  if  CW  is  a  valid 
Counterexample  Witness,  and  (ii)  if  CW  is  spurious  then  refine  Im  so  as  to  prevent 
CW  from  reappearing  in  future  iterations.  We  now  describe  these  two  steps  in  more 
detail.  We  end  this  chapter  with  a  description  of  the  complete  CEGAR  algorithm  in 
the  context  of  simulation  conformance. 


6.1  Witness  Validation 

Recall  that  checking  the  validity  of  CW  means  verifying  whether  CW  is  a  valid 
Counterexample  Witness  for  Game([P]r,  Sp),  where  [P]r  is  the  concrete  program 
semantics  and  Sp  is  the  specification.  Further  this  means  we  have  to  show  that  the 
following  two  conditions  are  satisfied:  (i)  CW  =4  fP]r  and  (ii)  CW  ^  Sp.  Since 
CW  is  a  Counterexample  Witness  for  Game(/m,  Sp),  the  condition  CW  ^  Sp  is 
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automatically  satisfied.  Hence  we  have  to  only  verify  that  CW  ^  [P]r. 


In  this  section  we  present  a  compositional  (component-wise)  algorithm  to  achieve 
this  goal.  We  assume  that  V  =  (C i ,.. .  ,Cn)  and  that  T  =  (71, . . .  ,7n)  is  the  context 
for  V  which  was  used  for  the  simulation  check.  We  begin  with  the  notion  of  projection 
of  an  LKS  on  an  alphabet. 

Definition  19  (LKS  Projection)  Let  M  =  (S,  Init,  AP ,  L,E,  T)  be  an  LKS  such 
that  t  qL  E,  and  E  C  E  be  an  alphabet.  Then  the  projection  of  M  on  E,  denoted  by 
M  \  E,  is  the  LKS  M'  =  ( S ,  Init,  AP,  L' ,  E  U  {r},  T)  such  that  L'  and  T  are  defined 
as  follows: 

•  Vs  G  S' .  L'(s)  =  L(s )  n  AP 

•  V(s,  ol,  s')  G  T.aeE  (s,  a,  s')  G  T' 

•  V(s,  a,  s')  G  T .  a  ^  E  (s,  r,  s')  G  T 

Note  that  M  \  E  has  the  same  states,  initial  states  and  atomic  proposition  as  M . 

Let  7  =  ( InitCond ,  AP,  E,  Silent,  FSM )  be  a  context  for  a  component  C.  Then 
we  write  M  J  7  to  mean  M  J  (AP  U  {Silent}).  Let  CW  be  a  CounterexampleWitness 
LKS.  Intuitively,  the  projection  CW  \  7  retains  the  contribution  of  C  toward  CW  and 
eliminates  the  contributions  of  the  other  components.  We  note  that  since  CW  has 
a  tree  structure,  so  does  CW  \  7.  Also  note  that  r  jL  E cw  but  r  G  Scrub-  This 
fact  enables  us  to  derive  Theorem  12  which  will  allow  us  to  verify  CW  p.  [Pjp  in  a 
component- wise  manner  using  weak  simulation  and  the  projections  of  CW.  Recall 
that  we  write  Mi  ^  M2  to  mean  that  LKS  M\  is  weakly  simulated  by  LKS  M2. 
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Theorem  12  (Compositional  Validation)  Let  M\  =  (Si,  Initi,  AP1,  L1;  E1;  T \ ) 

and  M2  =  (S2,  Init2,  AP2,  L2,T,2,  T2)  be  two  LKSs  such  that  r  ^  Ex  and  r  ^  E2. 
Lei  M  =  (S',  I  nit,  AP,  L,E,  T)  6e  another  LKS  such  that  E  =  Ex  U  S2.  T/ien  i/ie 
following  holds: 

M  p  Mi  ||  M2  «  ((M  J  SO  V  Mi)  /\  ((M  \  S2)  V  M2) 

Proof.  For  the  forward  implication,  let  IZ  C  S  x  (S'!  x  S2 )  be  a  simulation  relation 
such  that: 

Vs  G  Init .  3si  G  Initi  ■  3s2  G  Init2  .  sIZ(si,  s2 )  (6.1) 

Recall  that  the  set  of  states  of  M  \  Ex  is  S  and  that  the  set  of  initial  states  of 
M  |  E j  is  Init.  Define  relation  IZi  C  S  x  Si  as  follows: 

77i  =  {(s,si)  |  3s2  .  s77(si,s2)}  (6.2) 

From  6.1  and  6.2  we  have  directly: 

Vs  G  Init .  3si  G  Initi  .  (s,si)  G  IZi  (6.3) 

Now  we  need  to  prove  that  IZi  is  a  weak  simulation.  Consider  any  two  states 
s  G  S'  and  Si  G  Si  such  that  (s,Si)  G  IZi.  From  6.2  we  know  that: 

3s2  .  sIZ  (s1;  s2)  (6.4) 

Suppose  that  M  \  Ex  contains  the  following  transition  where  a  G  Ep 

s  s'  (6.5) 

The  following  commutative  diagram  explains  the  basic  idea  behind  the  proof. 
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From  6.5  and  Definition  19  we  can  conclude  that  M  contains  the  following 
transition: 

(6.6) 


a  / 
S  - >  S 


Since  77  is  a  simulation  relation,  from  6.4  and  6.6  we  know  that: 

34  G  s1 . 34  G  s2 .  (Sl,S2)  (4,4)  /\  577(4,4 

From  6.2  and  the  fact  that  s'TZ(s,1,s,2)  (cf.  6.7),  we  have: 

(s' i  4)  G  72-1 


(6.7) 


(6.8) 


Then  from  the  fact  that  (si,s2)  (4;  4)  (cf-  6-7)  and  that  a  G  Eb  we  have: 


"  j 


(6.9) 


From  6.8  and  6.9  we  conclude  that  77i  is  a  weak  simulation  relation. 

Now  suppose  that  M  J  E1  contains  the  following  transition: 

s  -U  s'  (6.10) 


From  6.10  and  Definition  19  we  can  conclude  that  there  exists  f3  ^  such  that 
M  contains  the  following  transition: 


s 


(6.11) 


The  following  commutative  diagram  explains  the  basic  idea  behind  the  proof. 
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Si 

Since  TZ  is  a  simulation  relation,  from  6.4  and  6.11  and  the  fact  that  /3  ^  E ,  and 
the  definition  of  parallel  composition  we  know  that: 

34  G  S2.  (si,s2)  (si,4)  /\  s'ft(si,s2)  (6.12) 

From  6.2  and  the  fact  that  s'TZ(si,s'2)  (cf.  6.12),  we  have: 

(«',Si)eKi  (6.13) 

From  6.13  we  can  again  conclude  that  TZi  is  a  weak  simulation  relation.  This 
completes  the  proof  that  (M  J  Ex)  4  Mi-  We  can  show  in  a  similar  manner  that 
(M  J  S2)  A  M2  and  hence  we  have  the  proof  of  the  forward  implication. 

For  the  reverse  implication  let  IZi  C  S  x  Si  be  a  weak  simulation  relation  such 
that: 

Vs  G  Init  .  3si  G  Initi  .  (s,Si)  G  TZ\  (6.14) 

Similarly  let  IZ2  C  S  x  S2  be  a  weak  simulation  relation  such  that: 

Vs  G  /nit  .  3s2  G  Init2  .  (s,  s2)  G  1Z2  (6.15) 

Dehne  relation  1Z  C  S  x  (S'!  x  S'2)  as  follows: 

7Z  =  {(s,  (si,  s2))  |  (s,  Si)  G  TZi  A  (s,  s2)  G  1Z2}  (6.16) 


in 

(si,s'2) 

in, 

Si 
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From  6.14,  6.15  and  6.16  we  get  immediately  the  following: 

Vs  G  Init .  3si  G  Initi .  3s2  G  Init2  ■  s77(si,  s2)  (6-17) 

Now  we  need  to  show  that  77  is  a  simulation  relation.  Consider  any  states  s  G  S, 
Si  G  Si  and  s2  G  S2  such  that: 

s77(si,s2)  (6.18) 

From  6.16  and  6.18  we  can  conclude  that: 

(s,Si)g77i  yAy  (s,s2)g772  (6.19) 

Now  suppose  that  M  contains  the  following  transition: 

s  ^  s'  (6.20) 

Then  we  need  to  show  the  following: 

34  e  s1 . 34  e  s2 ■  (si, s2)  —  (4, 4)  /\  s'k(4,4)  (6.21) 

From  6.16  and  6.21  it  is  clear  that  we  need  to  find  an  s)  G  S1  and  an  s2  G  S2  such 
that  the  following  holds: 

(si,s2) — ^  (sijSj)  A  (s/>  si)  e  77 !  A  (s'i  s>2)  e  772  (6.22) 

We  will  first  show  the  existence  of  such  an  s) .  Suppose  a  G  Si-  Then  from 
Definition  19  and  6.20  we  know  that  M  \  T,1  contains  the  following  transition: 

s  — — >  s'  (6.23) 

The  following  commutative  diagram  explains  the  basic  idea  behind  the  proof. 
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Since  H\  is  a  weak  simulation  relation  and  since  (s,si)  G  H\  (cf.  6.19)  we  have 
from  6.23: 

3s)  E.  Si  •  Si  — — »  si  /\  {s',  s[)  G  lli  (6.24) 

Clearly  the  s(  above  meets  the  requirement  of  6.22. 

Now  suppose  that  a  ^  Ex.  Then  from  Definition  19  and  6.20  we  know  that  M  \  E  j 
contains  the  following  transition: 

s  -U  s'  (6.25) 


The  following  commutative  diagram  explains  the  basic  idea  behind  the  proof. 


in 

(si,4) 

I  Ui 
Si 

Since  H\  is  a  weak  simulation  relation  and  since  (s,  si)  G  Hi  (cf.  6.19)  and  since 
r  ^  Ex,  we  have  from  6.25: 

{s' ,  si)  G  Hi  (6.26) 

Then  clearly  Si  itself  can  be  used  as  the  that  meets  the  requirement  of  6.22. 
Thus  we  have  shown  the  existence  of  an  which  satisfies  6.22  irrespective  of  whether 


s 

ni 

(si,s2) 

Hi  | 
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a  6  Si  or  not.  In  a  completely  symmetric  manner  we  can  show  the  existence  of  an 
s2  which  satisfies  6.22  irrespective  of  whether  a  £  S2  or  not.  Thus  we  have  shown 
that  1Z  is  a  simulation  relation.  This  completes  the  proof  of  the  reverse  implication 
and  hence  of  the  theorem. 


□ 

Theorem  12  essentially  allows  us  to  discharge  a  simulation  obligation  by 
performing  weak  simulation  checks  between  Counterexample  Witness  projections  and 
components.  It  is  easy  to  see  that  due  to  the  associativity  and  commutativity  of 
parallel  composition,  Theorem  12  can  be  extended  to  any  finite  number  of  LKSs.  In 
other  words,  Theorem  12  still  holds  if  we  replace  with  any  finite  sequence 

of  LKSs  {Mi, . . . ,  Mn). 

Algorithm  WeakSimul,  presented  in  Procedure  6.1,  takes  as  input  a 
Counterexample  Witness  projection  CW,  a  component  C  and  a  context  7.  Recall  that 
[C]  denotes  the  concrete  semantics  of  C  with  respect  to  7.  Algorithm  WeakSimul 
returns  true  if  CW  A  [C]^  and  false  otherwise. 

Given  two  LKSs  Mi  =  {Si,  Init1,  APX,  L1;  E1,  7\)  and  M2  = 
(S2,  Init2,  AP2,  L2,£2,  T2),  we  say  that  a  state  s2  £  S2  (weakly)  simulates  a 
state  Si  £  Si  iff  there  exists  a  (weak)  simulation  relation  1Z  C  S1  x  S2  such 
that  s{lZs2-  Intuitively,  WeakSimul  invokes  algorithm  CanSimul  (presented  in 
Procedure  6.2)  to  compute  the  set  of  states  of  [C]  that  can  weakly  simulate  the 
initial  state  of  CW.  Then  CW  ^  [C]  iff  some  initial  state  of  [C]  can  weakly 
simulate  the  initial  state  of  CW. 

The  recursive  algorithm  CanSimul  takes  as  input  a  projected  Counterexample 
Witness  CW,  a  state  s  of  CW,  a  component  C ,  and  a  context  7  for  C.  Recall  that 
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Procedure  6.1  WeakSimul  returns  true  iff  cw  X  [CJ7. 

Algorithm  WeakSimul(CW,  C,  7) 

-  CW  :  is  a  Counterexample  Witness 

-  C  :  is  a  component,  7  :  is  a  context  for  C 
let  CW  =  (51}  Init^  AP 1}  L1;  £x, 

let  [C]7  =  (S'2,  Init2,  AP 2,  I/2,  S2,  T2); 

//[C]7  is  i/ie  concrete  semantics  of  C  with  respect  to  7 
S'  :=  CanSimul( CW,  Initi,C ,  7); 

/ /S  —  states  of  [C]7  which  can  weakly  simulate  initial  state  of  CW 
return  (S  fl  I  nit  2}  ^  0; 


C10  has  a  tree  structure  and  hence  no  marking  of  the  states  of  CW  is  required  to 
avoid  revisiting  them.  CanSimul  computes  the  set  of  states  of  [C]  which  can  weakly 
simulate  the  sub-LKS  of  CW  with  initial  state  s.  It  manipulates  sets  of  states  of  [C]7 
using  the  symbolic  techniques  presented  in  Section  3.4.  In  particular  it  uses  the 
functions  Preimage  and  Restrict  to  compute  pre-images  and  restrict  sets  of  states 
with  respect  to  propositions. 

Theorem  13  Algorithms  CanSimul  and  WeakSimul  are  correct. 

Proof.  From  the  definition  of  weak  simulation,  the  correctness  of  Restrict  and 
Preimage ,  and  the  fact  that  r  ^  S c7. 


□ 

Note  that  the  ability  to  validate  a  Counterexample  Witness  using  its  projections 
enables  us  to  avoid  exploring  the  state-space  of  V .  Not  only  does  this  compositional 
approach  for  Counterexample  Witness  validation  help  us  avoid  state-space  explosion, 
it  also  identifies  the  particular  component  whose  abstraction  has  to  be  refined  in  order 
to  eliminate  a  spurious  Counterexample  Witness. 
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Procedure  6.2  CanSimul  computes  the  set  of  states  of  [C]7  which  can  weakly  simulate 
the  sub-LKS  of  CW  with  initial  state  s. _ 

Algorithm  CanSimul(  CW,  s,  C,  7) 

-  CW  :  is  a  Counterexample  Witness,  s  :  is  a  state  of  CW 

-  C  :  is  a  component,  7  :  is  a  context  for  C 
let  CW  =  (51?  Initi,  APU  Lx,  £x,  Tx)\ 
let  [C]7  =  (S2, 1  nit  2,  AP  2,  L2,  S2,  T2); 

//[C]  is  the  concrete  semantics  of  C  with  respect  to  7 
S'  :=  Restrict(S2,  C1(s)); 

/ / S  is  the  subset  of  S2  with  same  propositional  labeling  as  s 
for  each  s  -W  s'  &  //s'  is  a  successor  state  of  s 

S'  :=  CanSimul(  CW,  s',  C,  7);  // compute  result  for  successor 

if  ( a  yf  r)  then  S"  :=  PreImage(S' ,  cu);  //take  non-r  pre-image 
S  :=  S  fl  S";  // update  result 

return  S'; 


In  particular,  suppose  that  CW  is  a  spurious  Counterexample  Witness.  Recall 
that  our  program  consists  of  n  components  {Ci, . . . ,  Cn}.  Then  according  to 
Theorem  12  there  exists  a  minimum  i  G  {1, . . .  ,  n}  such  that  the  projection  of  CW 
on  7 j  is  not  weakly  simulated  by  the  concrete  semantics  of  Cx.  Therefore,  we  can 
eliminate  CW  by  refining  our  abstraction  for  C,  to  obtain  a  new  abstraction  Abs  such 
that  the  projection  of  CW  on  7*  is  not  weakly  simulated  by  Abs.  This  refinement 
process  is  presented  in  the  next  section. 

Example  15  Figure  6.1  shows  the  component  C  and  the  CounterexampleWitness 
CW  from  our  running  example.  Recall  that  the  actions  a,  /3,  x  and  5  are  performed 
by  the  library  routines  alpha,  beta,  chi  and  delta  respectively.  Since  there  is  only  one 
component,  the  projection  of  CW  is  CW  itself.  Note  that  CW  is  not  weakly  simulated 
by  [C]r  Intuitively  this  is  because  CW  can  perform  actions  a  and  S  along  the  two 
branches  of  its  computation  tree  but  [C]7  cannot.  This  is  because  for  [C]7  to  perform 
a  and  8,  variable  x  has  to  be  true  while  variable  y  has  to  be  false.  However  this  is 
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Component  Witness  CW 

Figure  6.1:  The  component  from  Figure  4.1  and  CounterexampleWitness  from  Figure  5.4. 
Note  that  the  CounterexampleWitness  is  spurious. 

clearly  impossible  due  to  the  initial  assignment  of  y  to  x.  Therefore  CW  is  a  spurious 
counterexample. 

6.2  Abstraction  Refinement 

Let  C  be  a  component,  7  be  a  context  for  C  and  CW  be  a  set  of  projections  of 
Counterexample  Witnesses  on  7.  Recall,  from  Section  4.5,  that  in  our  framework  a 
predicate  abstraction  is  determined  by  a  predicate  mapping.  The  predicate  mapping, 
in  turn,  is  obtained  from  a  set  of  seed  branches  (cf.  Section  4.5)  and  a  context 
using  predicate  inference  (cf.  Section  4.5).  The  abstraction  refinement  process  is 
encapsulated  by  algorithm  AbsRefine,  presented  in  Procedure  6.3.  Essentially,  it 
works  as  follows. 

Recall  the  algorithm  Predlnfer  from  Procedure  4.1  which  starts  with  a  set  of 
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Procedure  6.3  AbsRefine  returns  a  refined  abstraction  for  C  that  eliminates  a  set  of 
spurious  Counterexample  Witness  projections  CW  and  error  on  failure. 

Algorithm  AbsRefine  ((CW,  C,  7) 

-  CW  :  is  a  set  of  spurious  Counterexample  Witnesses 

-  C  :  is  a  component,  7  :  is  a  context  for  C 

let  CW  —  {CWi, . . . ,  CWk}; 

for  each  B  C  Be  j  j Be  is  the  set  of  branches  in  C 

fl  :=  Predlnferp,  7,  B); 

/ /n  is  set  of  predicates  inferred  from  B 
let  M  :=  pi*  =  (S,  frit,  AP,  L,  E,  t) ; 

/ /M  is  the  predicate  abstraction  of  C  using  fl 
flag  :=  TRUE; 

//flag  records  if  B  can  eliminate  every  element  of  CW 
for  i  —  1  to  k 

let  CWi  =  {Si,  Initi,  APi,  Li,  E*,  Tf); 

S  :=  AbsCanSimul(  CW*,  Initi:  M); 

/ jS  =  states  of  M  which  can  weakly  simulate  initial  state  of  CWi 
if  {{S  fl  I nit)  ^  0)  then  flag  :=  FALSE; 

// CWi  ^  M  and  hence  B  cannot  eliminate  CWi 
if  flag  then  return  M; 

return  ERROR; 


seed  branches  and  populates  each  statement  of  a  component  with  a  set  of  predicates 
which  can  be  used  subsequently  for  predicate  abstraction.  We  consider  subsets  of 
branches  of  C  in  increasing  order  of  size.  For  each  set  B  of  branches  we  compute  the 
predicate  mapping  fl  =  PredInfer(C,  7,  B)  for  C.  Next  we  compute  the  abstraction 
M  =  pi*  Let  CWi  =  {Si,  Initi,  AP.^  Hi,  f  for  each  CW*  G  CW.  Now  for 
each  CW*  G  CW,  we  invoke  algorithm  AbsCanSimul  (presented  in  Procedure  6.4) 
to  compute  the  set  of  states  of  M  which  can  weakly  simulate  Initi ■  P  f°r  each 
CWi  G  CW,  no  initial  state  of  M  can  weakly  simulate  Initi,  then  for  each 
CWi  G  CW,  CWi  iff  M.  In  this  case  we  report  M  as  the  refined  abstraction  for  C 
and  stop.  Otherwise,  we  move  on  with  the  next  set  of  branches  under  consideration. 
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Procedure  6.4  AbsCanSimul  computes  the  set  of  states  of  M  which  can  weakly  simulate 
the  sub-LKS  of  CW  with  initial  state  s. _ 

Algorithm  AbsCanSimul(  CW,  s,  M) 

-  CW  :  is  a  Counterexample  Witness,  s  :  is  a  state  of  CW 

-  M  is  an  LKS  obtained  by  predicate  abstraction 
let  CW  =  ( S ,  I  nit,  AP,  L,  E,  T); 

let  M=  (s,Init,  AP,Z,E,  r); 

S'  :={seS  |  L(S)  =  L(s)}; 

//A'  is  t/ie  subset  of  S  with  same  propositional  labeling  as  s 
for  each  s  s'  e  T  // s'  is  a  successor  state  of  s 

S"  :=  AbsCanSimu^CW,  s',  M);  / / compute  result  for  successor 

if  (a  7^  r)  then  S'"  :=  {s  G  S  |  Succ(s,a)  D  S'"  ^  0};  / /take  non-r  pre-image 
S'  S'  fl  S'";  / /update  result 

return  S''; 


Theorem  14  Algorithm  AbsReflne  is  correct. 

Proof.  It  is  obvious  that  AbsRefine  either  returns  ERROR  or  a  refined  abstraction 
M  such  that  Vi  G  {1, . . . ,  k}  .  CW  *  ^  M . 


□ 

Recall  the  component  C  and  the  spurious  CounterexampleWitness  CW  from 
Figure  6.1.  Note  that  both  branch  statements  3  (with  branch  condition  x)  and  4  (with 
branch  condition  y )  are  required  as  seeds  in  order  to  obtain  a  predicate  abstraction 
that  is  precise  enough  to  eliminate  CW.  Neither  branch  3  nor  branch  4  is  adequate 
by  itself.  This  is  because  the  spuriousness  of  CW  relies  on  the  direct  correlation 
between  the  truth  and  falsehood  of  variables  x  and  y.  Any  abstraction  must  capture 
this  correlation  in  order  to  eliminate  CW.  In  our  framework,  this  can  only  be  achieved 
by  using  both  3  and  4  as  seed  branches  for  the  predicate  inference  and  subsequent 
predicate  abstraction. 
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Figure  6.2:  On  the  left  is  the  refined  abstraction  of  the  component  from  Figure  6.1  using  states 
{3,4}  as  seeds.  The  empty  valuation  _L  is  written  as  “()”.  On  the  right  is  the  specification 
from  Figure  5.3.  Note  that  the  refined  abstraction  is  simulated  by  the  specification. 

Example  16  Recall  from  Figure  f.3  the  predicate  mapping  obtained  by  using  states 
{3,4}  as  seeds.  Figure  6.2  shows  the  refined  abstraction  using  the  resulting  predicate 
mapping.  The  empty  valuation  _L  is  written  as  It  also  shows  the  specification  of 
our  running  example  from  Figure  5.3.  Note  that  the  refined  abstraction  is  simulated 
by  the  specification. 

It  is  well  known  that  simulation  is  preferred  over  trace  containment  because  it 
does  not  require  the  complementation  (and  hence  potential  exponential  blowup  in 
size)  of  the  specification.  Our  running  example  illustrates  an  additional  advantage 
of  simulation  conformance  over  trace  containment  in  the  context  of  CEGAR-based 
verification.  In  particular,  the  additional  structure  (and  hence  information)  conveyed 
by  tree  counterexamples  obtained  in  the  context  of  simulation  conformance  can  aid 
in  quicker  predicate  discovery  and  termination. 

Suppose  we  had  attempted  to  check  trace  containment  on  our  example.  We 
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know  that  at  least  the  two  branches  3  and  4  are  required  for  successful  verification. 
Also  note  that  these  two  branches  cannot  appear  simultaneously  in  the  same  trace 
counterexample  since  they  appear  in  disjoint  fragments  of  the  control  flow  of  the 
component.  Hence  we  would  have  required  at  least  two  refinement  steps  in  order  to 
successfully  verify  trace  containment.  In  contrast,  as  we  have  already  seen,  verification 
of  simulation  conformance  requires  just  a  single  refinement  step.  We  will  provide 
experimental  evidence  supporting  this  intuitive  argument  in  Section  6.5. 

Another  approach  to  speed-up  the  termination  of  the  CEGAR  loop  is  to  generate 
multiple  counterexamples  at  the  end  of  each  unsuccessful  verification  step.  The 
idea  is  that  more  counterexamples  convey  more  information  and  will  lead  to  quicker 
realization  of  an  abstraction  that  is  precise  enough  to  either  validate  the  existence  of 
conformance  or  yield  a  non-spurious  counterexample.  However,  manipulating  a  large 
number  of  counterexamples  is  expensive  and  will  only  provide  diminishing  returns 
beyond  a  certain  threshold.  We  will  also  provide  experimental  justification  of  this 
argument  in  Section  6.5. 

Note  that  our  algorithm  for  constructing  predicate  mappings  is  restricted  in 
the  sense  that  it  can  only  derive  predicates  from  branch  conditions.  Therefore,  in 
principle,  we  might  be  unable  to  eliminate  a  spurious  Counterexample  Witness.  In 
the  context  of  algorithm  AbsRefine,  this  means  that  we  could  end  up  trying  all 
sets  of  branches  without  hireling  an  appropriate  refined  abstraction  M .  In  such  a 
case  we  return  ERROR.  We  note  that  this  scenario  has  never  transpired  during  our 
experiments.  Moreover,  any  abstraction  refinement  technique  must  necessarily  suffer 
from  this  limitation  since  the  problem  we  are  attempting  to  solve  is  undecidable  in 
general. 

Also  AbsRefine  attempts  to  eliminate  a  set  of  spurious  Counterexample  Witness 
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projections  instead  of  a  single  projection.  This  will  be  necessary  in  the  context  of 
the  complete  CEGAR  algorithm  presented  in  the  next  section.  In  fact,  AbsRefine 
iterates  through  the  subsets  of  Be  (the  set  of  all  branches  in  C)  in  increasing  order 
of  size.  Therefore  the  refined  abstraction  returned  by  AbsRefine  corresponds  to  a 
minimal  set  of  branches  that  can  eliminate  the  entire  set  of  spurious  Counterexample 
Witness  projections  passed  to  it.  This  is  an  important  feature  because  the  size  of  an 
abstraction  is,  in  the  worst  case,  exponential  in  the  number  of  branches  used  in  its 
construction. 

However,  AbsRefine  is  also  naive  in  the  following  sense.  As  we  shall  see  shortly, 
the  set  of  Counterexample  Witness  projections  passed  to  AbsRefine  by  the  top- 
level  CEGAR  algorithm  will  increase  monotonically  across  successive  invocations. 
Nevertheless,  AbsRefine  naively  recomputes  AbsCanSimul(CWj,  Initcwn  M) 
even  though  it  might  already  have  encountered  CW \  in  a  previous  invocation.  In 
Chapter  7  we  shall  present  a  more  sophisticated  approach  that  avoids  this  redundant 
computation  without  compromising  on  the  minimality  of  the  number  of  branches  used 
in  the  computation  of  the  refined  abstraction. 


6.3  CEGAR  for  Simulation 

The  complete  CEGAR  algorithm  in  the  context  of  simulation  conformance,  called 
SimulCEGAR,  is  presented  in  Procedure  6.5.  It  invokes  at  various  stages  algorithms 
Predlnfer,  the  predicate  abstraction  algorithm,  SimulWitness,  WeakSimul  and 
AbsRefine.  It  takes  as  input  a  program  V,  a  specification  LKS  Sp  and  a  context  T  for 
V  and  outputs  either  “V  ^  Spv  or  lLV  7^  Sp ”  or  ERROR.  Intuitively  SimulCEGAR 
works  as  follows. 
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Procedure  6.5  SimulCEGAR  checks  simulation  conformance  between  a  program  V  and 
a  specification  Sp  in  a  context  Y . 

Algorithm  SimulCEGAR^,  Sp,  T) 

-  V  :  is  a  program,  T  :  is  a  context  for  V 

-  Sp  :  is  a  specification  LKS 

let  V  =  (Ci, , Cn)  and  Y  =  (71, ^ 

for  each  i  6  {1, ...  ,n},  Id*  :=  PredInfer(Cj,  7,,  0)  and  M*  :=  |Ctl^  and  CW \  :=  0; 
/  /  initialize  abstractions  with  empty  set  of  seed  branches 

forever  do 

let  M  =  Mi  ||  •  •  •  ||  A4; 

/ / M  is  the  composition  of  predicate  abstractions 
if  (SimulWitness(M,  Sp)  =  “M  7;  Sp”)  return  UV  =4  Sp"-, 
j  j  if  the  property  holds  on  M  it  also  holds  on  V 
let  CW  =  Counterexample  Witness  returned  by  SimulWitness; 
find  i  E  {1, . . . ,  n}  such  that  -iWeakSimul(  CW  J  ryl,Cl,  7^) ; 

/ /  check  compositionally  if  CW  is  spurious 
if  (no  such  i  found)  return  “V  4  Sp 

/ /  CW  is  valid  and  hence  V  is  not  simulated  by  Sp 
else  CWi  :=  CW i  U  {CW  \  7,}; 

// update  the  set  of  spurious  Counterexample  Witnesses 
if  (AbsRefine (CW*,  Ci,  'jf)  =  error)  return  ERROR; 

Mi  :=  AbsRefine(  CW*,  Ci,  7*) ;  / /refine  the  abstraction  and  repeat 


Let  V  =  (Ci, . . . ,  Cn).  Then  SimulCEGAR  maintains  a  set  of  abstractions 
Mi, . . . ,  Mn  where  Mj  is  a  predicate  abstraction  of  Ct  for  i  E  {l,...,n}.  It  also 
maintains  a  set  of  spurious  Counterexample  Witness  projections  {CWi, . . . ,  CWn} 
which  are  all  initialized  to  the  empty  set.  Note  that  by  Theorem  7,  M  =  Mi  ||  •  •  •  || 
Mn  is  an  abstraction  of  V .  Initially  each  Mi  is  set  to  the  predicate  abstraction  of 
C  corresponding  to  an  empty  set  of  seed  branches.  Next  SimulCEGAR  iteratively 
performs  the  following  steps: 

1.  (Verify)  Invoke  algorithm  SimulWitness  to  check  if  M  is  simulated  by  Sp.  If 
SimulWitness  returns  “M  7;  Sp'1  then  output  UV  ^4  Spv  and  exit.  Otherwise 
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let  CW  be  the  Counterexample  Witness  returned  by  SimulWitness.  Go  to 
step  2. 


2.  (Validate)  For  i  e  {l,...,n}  invoke  WeakSimul(CW  J  7*,  Ct,  7*).  If  every 
invocation  of  WeakSimul  returns  true  then  CW  is  a  valid  Counterexample 
Witness.  In  this  case,  output  "V  ^  Sp”  and  exit.  Otherwise  let  i  be  the 
minimal  element  of  {l,...,n}  such  that  WeakSimul( CW  J  7i,C,7i)  returns 
false.  Go  to  step  3. 

3.  (Refine)  Update  CW j  by  adding  CW  \  7*  to  it.  Invoke 

AbsRefine(C'Wj,  Cj,  7i).  If  AbsRefine  returns  error,  output  error 
and  stop.  Otherwise  set  Mi  to  the  abstraction  returned  by  AbsRefine.  Repeat 
from  step  1. 

Theorem  15  Algorithm  SimulCEGAR  is  correct. 

Proof.  When  SimulCEGAR  returns  UV  C  Sp ”  its  correctness  follows  from 
Theorem  7,  Theorem  11  and  Theorem  1.  When  SimulCEGAR  returns  UV  Sp ” 
its  correctness  follows  from  Theorem  12,  Theorem  13  and  Theorem  14. 

□ 


6.4  The  MAGIC  Tool 

We  have  implemented  the  game  semantics  based  refinement  approach  within  the 
MAGIC  [25,  80]  tool.  In  this  section  we  give  a  brief  overview  of  how  magic  can  be 
used.  This  is  essentially  a  copy  of  the  tutorial  of  version  1.0  of  magic  available 
online  at  http://www.cs.cmu.edU/~chaki/magic/tutorial-l.O.html,  and  should 
be  a  good  starting  point. 
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6.4.1  A  Simple  Implementation 


The  goal  of  MAGIC  is  to  ascertain  that  an  implementation  conforms  to  a  specification. 
The  implementation  is  always  a  (possibly  concurrent)  C  program,  with  each 
sequential  component  being  a  C  procedure.  Let  us  begin  with  a  simple  sequential 
implementation. 

int  my_proc(int  x) 

{ 

int  y; 

if (x  ==  0)  { 
y  =  foo() ; 

if(y  >  0)  return  10; 
else  return  20; 

}  else  { 

y  =  bar() ; 

if(y  <  0)  return  30; 
else  return  40; 

> 

> 

6.4.2  A  Simple  Specification 

We  shall  now  try  to  construct  an  appropriate  specification  for  my_proc  and  then 
verify  it.  Note  that  we  are  intentionally  proceeding  in  the  reverse  direction  for  ease 
of  understanding.  Normally,  if  standard  software  engineering  practices  have  been 
followed,  the  specification  always  comes  into  existence  before  the  implementation. 

So  what  could  be  a  good  specification  for  my_proc?  In  a  kind  of  pseudo-code,  one 
might  make  the  following  claim  about  my_proc: 


1.  If  the  first  argument  to  my_proc  is  equal  to  zero: 
•  my_proc  calls  the  library  routine  foo. 
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•  Depending  on  whether  the  value  returned  by  foo  is  greater  than  zero  or 
not,  my_proc  returns  either  10  or  20. 

2.  Otherwise,  if  the  first  argument  to  my_proc  is  not  equal  to  zero: 

•  my_proc  calls  the  library  routine  bar. 

•  Depending  on  whether  the  value  returned  by  bar  is  less  than  zero  or  not, 
my_proc  returns  either  30  or  40. 

It  is  clear  that  any  specification  of  my_proc  must  take  into  account  its  calling 
context,  since  the  behavior  of  my_proc  is  dependent  on  the  value  of  its  first  argument. 
Further,  the  behavior  of  my_proc  in  the  first  case  (i.e.  when  its  first  argument  is  equal 
to  zero)  can  be  expressed  by  the  following  simple  LKS. 


return  j$0  ==  20} 

Writing  down  LKSs 

MAGIC  uses  an  extended  FSP  [79]  notation  to  specify  LKSs.  The  above  LKS  can  be 
expressed  in  magic’s  notation  as  follows: 

51  =  (  call_f oo  ->  S2  ) , 

52  =  (  return  {$0  ==  10}  ->  STOP  |  return  {$0  ==  20}  ->  STOP  ) . 

Note  that  the  name  of  an  LKS  is  simply  the  name  of  its  initial  state.  Also  note 
that  the  transitions  of  the  LKS  are  labeled  with  actions.  The  transition  from  the 
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initial  state  (SI)  is  labeled  by  an  action  call_foo.  This  action  encapsulates  the 
externally  observable  event  of  library  routine  foo  being  invoked.  Such  actions,  with 
only  names,  are  also  called  basic  actions. 

Return  actions 

As  anyone  familiar  with  the  FSP  will  realize,  we  extend  the  FSP  notation  to  express 
a  special  class  of  actions  called  return  actions.  Return  actions  are  of  the  form 
return  {expression}  or  return  {}  where  the  former  expresses  the  return  of  an 
integer  value  and  the  latter  expresses  returning  a  void  value.  In  a  return  action  of  the 
form  return  {expression},  the  expression  represents  a  condition  satisfied  by  the 
return  value.  The  return  value  itself  is  represented  by  the  dummy  variable  $0.  For 
instance,  the  action  return  {$0  <  5}.  represents  the  return  of  an  integer  value  less 
than  5. 

Procedure  Block 

We  are  now  ready  to  express  the  fact  that  SI  specifies  the  behavior  of  my_proc  when 
the  first  argument  of  my_proc  is  equal  to  zero.  In  magic,  this  can  be  achieved  by  the 
following  procedure  block: 
cproc  my_proc  { 

abstract  {  abs_l  ,  ($1  ==  0)  ,  SI  }; 

} 

Note  the  keywords  cproc  and  abstract.  The  block  keyword  cproc  indicates  that 
we  are  going  to  say  something  about  a  C  procedure.  It  is  followed  by  the  name  of  the 
procedure  and  then  by  a  set  of  statements  enclosed  within  a  pair  of  curly  braces.  Each 
such  statement  typically  consists  of  an  statement  keyword  followed  by  other  terms. 
The  procedure  whose  name  follows  cproc  is  often  referred  to  as  the  scope  procedure. 
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One  such  statement  keyword  is  abstract. This  keyword  indicates  that  we  are 
expressing  an  abstraction  relation  between  the  scope  procedure  and  an  LKS.  Note 
the  guard  ($1  ==  0)  where  $1  refers  to  the  first  argument.  In  general  $i  can  be 
used  to  refer  to  the  i-th  argument  of  the  scope  procedure.  Finally  note  that  the 
abstraction  statement  has  a  name,  abs_l.  For  procedure  blocks,  the  abstraction 
names  are  just  placeholders  and  have  no  special  significance.  However,  soon  we  will 
discuss  program  blocks  and  for  them,  the  names  of  abstraction  statements  will  be  of 
crucial  importance. 

The  following  LKS  expresses  the  behavior  of  my_proc  when  its  first  argument  is 
not  equal  to  zero: 

53  =  (  call_bar  ->  S4  ) , 

54  =  (  return  {$0  ==  30}  ->  STOP  |  return  {$0  ==  40}  ->  STOP  ) . 

Thus  we  can  have  another  procedure  block  to  specify  the  relation  between  my_proc 
and  S3. 

cproc  my_proc  { 

abstract  {  abs_2  ,  ($1  !=  0)  ,  S3  }; 

} 

In  general,  multiple  procedure  blocks  can  be  combined  into  one  as  long  as  they 
have  the  same  scope  procedure.  Also  the  order  of  statements  within  a  procedure 
block  is  irrelevant.  Thus,  the  above  two  procedure  blocks  together  is  equivalent  to 
the  following  single  procedure  block: 

cproc  my_proc  { 

abstract  {  abs_2  ,  ($1  !=  0)  ,  S3  }; 

abstract  {  abs_l  ,  ($1  ==  0)  ,  SI  }; 

} 

MAGIC  requires  that  the  guards  of  abstraction  statements  for  any  scope  procedure 
be  mutually  disjoint  and  complete  (i.e.  cover  all  possibilities  of  argument  valuations). 
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This  is  necessary  to  enable  magic  to  unambiguously  identify  the  applicable 
abstraction  in  any  given  calling  context  of  the  scope  procedure. 

Specifying  Library  Routines 

In  order  to  construct  a  proper  model  for  my_proc  MAGIC  must  know  about  the 
behavior  of  the  library  routines  called  by  my_proc.  Let  us  assume  that  the  actual 
code  for  foo  and  bar  are  unavailable.  In  such  a  case,  MAGIC  requires  that  the  user 
supply  appropriate  abstractions  for  these  two  routines.  In  particular,  suppose  that 
foo  and  bar  are  respectively  abstracted  by  the  LKSs  FOO  and  BAR  described  below: 

FOO  =  (  call_foo  ->  return  {$0  ==  -1}  ->  STOP  ) . 

BAR  =  (  call_bar  ->  return  {$0  ==  50}  ->  STOP  ) . 

Then  the  following  program  blocks  can  be  used  to  express  the  relation  between 
foo,  bar  and  their  abstractions. 

cproc  foo  { 

abstract  {  abs_3  ,  (1)  ,  FOO  }; 

} 

cproc  bar  { 

abstract  {  abs_4,  (1),  BAR  }; 

> 

Note  that  the  guard  in  both  abstraction  statements  is  1,  which  denotes  true 
according  to  C  semantics.  This  therefore  means  that  under  all  calling  contexts,  foo 
and  bar  are  abstracted  by  FOO  and  BAR  respectively.  Also  note  that  specifications  and 
abstractions  are  syntactically  identical.  This  makes  sense  because  both  abstractions 
and  specifications  are  essentially  asserting  the  same  thing  viz.  that  under  a  certain 
calling  context,  a  procedure’s  behavior  is  subsumed  by  the  behavior  of  an  LKS.  The 
only  difference  is  that  the  assertion  made  by  an  abstraction  can  be  assumed  to  be 
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true  while  the  assertion  made  by  a  specification  needs  to  be  validated.  This  has  at 
least  two  significant  consequences: 

•  Verifying  Incomplete  Code:  In  practice,  one  cannot  assume  that  the  actual 
code  for  each  and  every  library  routine  used  by  a  program  will  be  available  to 
the  verifier.  Hence  being  able  to  provide  abstractions  allows  magic  to  analyze 
such  incomplete  implementations.  In  effect,  abstractions  allow  us  to  specify 
assumptions  about  the  environment  in  which  a  program  operates. 

•  Compositionality:  Often  programs  are  simply  too  big  to  be  analyzed  as 
a  monolithic  piece  of  code.  Abstractions  allow  us  to  decompose  such  large 
implementations  into  smaller,  more  manageable  fragments.  Fragments  can 
be  verified  one  at  a  time.  While  verifying  one  fragment,  the  abstractions  of 
other  fragments  can  be  used  as  assumptions.  The  fact  that  specifications  and 
abstractions  are  identical  implies  that  they  can  naturally  switch  from  one  role 
to  the  other  depending  on  which  fragment  is  being  verified. 

Program  Block 

It  is  now  time  to  specify  the  entire  program  that  we  want  to  verify.  In  our  case 
the  program  is  sequential,  i.e.  it  has  a  single  component  consisting  of  the  procedure 
my_proc.  The  following  program  block  expresses  the  relation  between  our  program 
and  its  specification: 

cprog  my_prog  =  my_proc  { 

abstract  abs_5,  {($1  ==  0)},  SI; 
abstract  abs_6,  {($1  !=  0)},  S3; 

} 

This  looks  a  lot  like  a  procedure  block  but  there  are  some  crucial  differences.  First, 
it  begins  with  the  keyword  cprog  and  not  cproc.  This  is  followed  by  the  name  of 
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the  program  (which  is  again  a  placeholder  and  does  not  serve  any  other  purpose),  an 
equal  to  sign  and  then  a  list  of  procedure  names.  Intuitively  these  are  the  names  of 
the  procedures  which  execute  in  parallel  and  constitute  the  program.  In  the  above 
block  this  list  has  a  single  procedure  name  viz.  my_proc,  signifying  that  our  program 
has  just  one  component  that  executes  my_proc. 

Following  the  list  of  procedure  names  we  have  a  sequence  of  statements  enclosed 
within  curly  braces.  Just  like  procedure  blocks,  abstraction  statements  are  used 
to  provide  specifications.  But  abstraction  statements  for  program  blocks  are 
syntactically  different.  They  do  begin  with  the  abstract  keyword,  but  the  rest  of  it 
them  is  not  enclosed  within  curly  braces.  Instead  there  are  three  components.  The 
first  is  the  name  of  the  abstraction  statement.  This  is  used  by  MAGIC  to  identify 
the  target  abstraction  to  be  validated.  The  second  is  a  list  of  guards,  one  for  each 
component  of  the  program.  Each  guard  in  the  list  expresses  the  beginning  state  of 
the  corresponding  component.  In  the  above  block,  the  list  has  just  one  element  that 
expresses  the  starting  context  of  my_proc.  Note  that  the  list  of  guards  is  enclosed 
within  curly  braces.  The  third  and  final  component  is  the  name  of  the  LKS  which 
specifies  the  program. 

Comments 

You  can  use  either  C-style  or  C++  style  comments  in  specification  hies. 

/*  this  is  a  comment  */ 

//  so  is  this  one 
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6.4.3  Running  MAGIC 


We  are  now  ready  to  try  out  magic.  First  save  the  C  program  in  a  file  whose  name 
must  end  with  “.pp”,  say  my_proc.pp.  Next  save  the  specifications  in  another  hie 
whose  name  ends  with  “.spec”,  for  example  my_spec-l .  0 .  spec.  Finally  run  MAGIC: 

$  magic  — abstraction  abs_5  my_proc.pp  my_spec-l . 0 . spec  — optPred 

MAGIC  will  try  to  validate  the  abstraction  statement  with  name  abs_5.  The 
— optPred  options  tells  MAGIC  to  perform  counterexample  guided  abstraction 
refinement  with  predicate  minimization.  It  is  usually  a  good  idea  to  always  use 
this  option.  For  details  on  other  options  that  magic  can  accept,  look  at  the  user’s 
manual.  If  all  goes  well,  magic  should  be  able  to  successfully  verify  the  abstraction 
and  produce  an  output  that  ends  with  something  like  this: 

conformance  relation  exists  ! ! 

abstraction  abs_5  is  valid  .  .  . 

Simplify  process  destroyed  .  .  . 

terminating  normally  . . . 

Similarly  you  can  try  to  verify  abs_6  and  MAGIC  should  be  able  to  do  it.  If 
you  look  again  at  my_spec-l .  0 .  spec  you  will  notice  that  we  have  added  two  more 
abstraction  statements,  abs_7  and  abs_8,  to  the  my_prog  block.  They  are  similar 
to  abs_5  and  abs_6  except  that  the  guard  conditions  have  been  switched.  Clearly 
they  are  invalid  specifications  and  magic  should  be  able  to  detect  this.  Try  verifying 
abs_7  by  typing  the  following: 

$  magic  — abstraction  abs_7  my_proc.pp  my_spec-l . 0 . spec 
— optPred  — ceShowAct 

MAGIC  should  tell  you  that  this  is  an  invalid  specification  and  further  provide  you 
with  a  counterexample.  The  output  should  look  something  like  the  following: 
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branch  (  PO : : x  ==  0  )  :  [PO : : x  ==  0]  :  TRUE 
############  PO:: epsilon  ############ 

PO :  :y  =  f  oo  (  )  :  [] 

############  PO:: epsilon  ############ 

PO:  :y  =  foo  (  )  :  [] 

############  call_f oo  ############ 

PO:  :y  =  foo  (  )  :  [] 

############  {PO : : y  =  [  -1  ]}  ############ 
branch  (  PO :  :  y  >  0  )  :  []  :  TRUE 
############  PO:: epsilon  ############ 
return  (  10  )  :  [] 

############  return  {  30  }  ############ 

CE  dag  projections  analysed  . . . 
conformance  relation  does  not  exist  ! ! 
abstraction  abs_7  is  invalid  . . . 

Simplify  process  destroyed  . . . 
terminating  normally  . . . 

6.4.4  A  Concurrent  Example 

Let  us  now  verify  a  concurrent  program.  Our  concurrent  program  will  be  very 
simple.  It  will  be  two  copies  of  my_proc  executing  in  parallel.  This  is  easy  to 
understand  because  the  resulting  parallel  program  should  behave  exactly  like  a  single 
copy  of  my_proc  (since  our  notion  of  parallel  composition  is  idempotent).  All  we 
need  to  do  is  create  a  new  program  block  specifying  our  example.  Here  is  a  sample 
my_conc-l .  0 .  spec.  Notice  that  it  has  four  abstraction  statements  abs_9.  abs_10, 
abs_ll  and  abs_12  out  of  which  the  first  two  are  valid  while  the  last  two  are  invalid. 
We  can  try  to  verify  abs_9  by  the  following  command: 

$  magic  — abstraction  abs_9  my_proc.pp  my_conc-l . 0 . spec  — optPred 

This  should  succeed.  Likewise  magic  should  be  able  to  prove  that  abs_10  is  also 
valid  while  abs_ll  and  abs_12  are  both  invalid. 
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6.4.5  Other  Keywords 


In  addition  to  abstract,  there  are  several  other  keywords  that  can  be  used  in 
procedure  blocks  for  performing  specific  tasks.  In  this  section  we  mention  a  few 
important  ones. 

Supplying  predicates 

The  user  can  manually  supply  predicates  to  guide  magic’s  predicate  abstraction. 
Often  this  is  useful  when  MAGIC  fails  to  discover  a  satisfactory  set  of  predicates  in 
a  reasonable  amount  of  time.  Predicates  are  supplied  in  a  per-procedure  basis.  An 
important  feature  of  MAGIC  is  that  all  user-supplied  predicates  for  a  procedure  proc 
must  be  syntactically  equivalent  to  some  branch  condition  in  proc. Otherwise  that 
predicate  will  be  simply  ignored  by  MAGIC.  For  example  consider  the  following  0 
procedure: 

int  proc() 

{ 

int  x  =  5; 

if (x  <  10)  return  -1; 
else  return  0; 

} 

Suppose  we  want  to  prove  using  magic  that  proc  is  correctly  specified  by  the 
following  LKS: 

PROC  =  (  return  {$0  ==  -1}  ->  STOP  ) . 

Normally  we  would  do  this  by  simply  asking  MAGIC  to  perform  automated 
abstraction  refinement  (using  the  — optPred  option).  However  suppose  we  have 
a  good  idea  about  which  predicate  magic  will  need  to  complete  successfully.  For 
example,  in  this  case  (x  <  10)  is  the  required  predicate  (note  that  this  corresponds 
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to  a  branch  condition  in  proc).  Then  we  can  simply  tell  magic  to  use  this  predicate 
by  using  the  predicate  keyword.  The  following  procedure  block  shows  how  to  do 
this: 

cproc  proc  { 

predicate  (x  <  10) ; 

} 

MAGIC  will  look  for  branch  statements  in  proc  which  have  a  branch  condition 
(x  <  10).  If  it  finds  any  such  branch,  it  will  use  the  corresponding  branch  condition 
as  a  seed  predicate.  Otherwise  it  will  ignore  the  user  supplied  predicate.  Multiple 
predicates  can  be  supplied  in  one  statement  using  a  comma-separated  list  or  they  can 
be  supplied  via  multiple  predicate  statements.  Also  the  order  in  which  predicates 
are  supplied  is  irrelevant.  For  example  the  two  following  procedure  blocks  each  have 
the  same  effect  as  the  procedure  block  above: 

cproc  proc  { 

predicate  (y  ==  10)  ,  (w  ==  5)  ,  (z  +w  >  20)  , 

(x  <  10)  ,  (x+y  ! =  5) ; 

} 


cproc  proc  { 

predicate  (x+y  !=  5); 
predicate  (z+w  >  20)  ,  (y  ==  10) ; 
predicate  (x  <  10)  ,  (w  ==  5) ; 


Inlining  Procedures 


Suppose  procedure  foo  calls  procedure  bar.  Normally  MAGIC  will  not  inline  bar 
within  foo  even  if  the  code  for  bar  is  available.  It  has  to  be  told  explicitly  to  do 
this  via  the  inline  keyword.  Here’s  a  procedure  block  that  demonstrates  how  to 
do  this.  Once  again  inlining  has  to  be  done  on  a  procedure-to-procedure  basis.  For 
example  the  following  procedure  block  will  not  cause  bar  to  be  inlincd  within  some 
other  procedure  baz. 
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cproc  foo  { 

inline  bar; 

} 

6.4.6  Drawing  with  MAGIC 

magic  can  be  used  to  output  control  flow  graphs,  LKSs  and  intermediate  data 
structures  as  postscript  files.  This  is  very  useful  for  visualization  and  understanding. 
For  example,  using  the  following  command  line  on  draw.pp  and  draw-1 . 0 .  spec  files 
produces  a  draw-1 . 0 .  ps  file. 

$  magic  — abstraction  abs_l  draw.pp  draw-1 . 0 . spec  — optPred 
— drawPredAbsLTS  — drawFile  draw-1. O.ps 

Also,  using  the  following  command  line  on  my_proc.pp  and  my_spec-l . 0 .  spec 
yields  my_proc-l .  0 .  ps. 

$  magic  — abstraction  abs_5  my_proc.pp  my_spec-l . 0 . spec  — optPred 
— drawPredAbsLTS  — drawFile  my_proc-l . 0 .ps 

Please  look  at  the  user’s  manual  for  more  details  on  command  line  options  that 
control  magic’s  drawing  capabilities.  Also  note  that  in  order  to  draw  its  figures 
MAGIC  requires  the  GRAPHVIZ  package,  and  in  particular  the  DOT  tool.  However, 
if  you  do  not  want  to  use  magic’s  drawing  capabilities,  there  is  no  need  to  install 
GRAPHVIZ.  At  this  point,  you  should  be  more  or  less  familiar  with  magic  and  ready 
to  play  around  with  it.  Have  fun  !! 


6.5  Experimental  Results 

We  validated  the  game  semantics  based  refinement  approach  experimentally  using 
MAGIC.  As  the  source  code  for  our  experiments  we  used  version  0.9.6c  of  OpenSSL  [95], 
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an  open  source  implementation  of  the  SSL  [109]  protocol  used  for  secure  exchange  of 
information  over  the  Internet.  In  the  rest  of  this  thesis  we  will  refer  to  version  0.9.6c 
of  OpenSSL  as  simply  OpenSSL.  In  particular,  we  used  the  code  that  implements  the 
server  side  of  the  initial  handshake  involved  in  SSL.  The  source  code  consisted  of 
about  74000  lines  including  comments  and  blank  lines.  In  Appendix  A  we  present 
our  OpenSSL  benchmark  in  greater  detail. 

All  our  experiments  were  carried  out  on  an  AMD  Athlon  1600  XP  machine  with 
900  MB  RAM  running  RedHat  7.1.  Our  experiments  were  carried  out  to  achieve  two 
broad  objectives.  First,  we  wanted  to  verify  the  advantages  of  simulation  conformance 
over  trace-containment  conformance.  Second,  we  wanted  to  evaluate  the  effect  of  using 
multiple  spurious  Counterexample  Trees  for  abstraction  refinement  in  every  iteration 
of  the  CEGAR  loop. 

As  we  shall  see  shortly,  our  experimental  results  indicate  that  compared  to 
trace  containment,  on  average,  simulation  leads  to  6.62  times  faster  convergence 
and  requires  11.79  times  fewer  iterations.  Furthermore,  refining  on  multiple 
Counterexample  Trees  per  iteration  leads  to  up  to  25%  improvement  in  performance. 
However,  using  more  than  four  Counterexample  Trees  is  counterproductive. 

We  manually  designed  a  set  of  eleven  specifications  by  reading  the  SSL 
documentation.  Each  of  these  specifications  was  required  to  be  obeyed  by  any 
correct  SSL  implementation.  Each  specification  captured  critical  safety  requirements. 
For  example,  among  other  things,  the  first  specification  enforced  the  fact  that  any 
handshake  is  always  initiated  by  the  client  and  followed  by  a  correct  authentication 
by  the  client.  Each  specification  combined  with  the  OpenSSL  source  code  yielded  one 
benchmark  for  experimentation. 

First,  each  benchmark  was  run  twice,  once  using  simulation  and  again  using  trace 
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Figure  6.3:  Comparison  between  simulation  and  trace  containment  in  terms  of  time  and  number 
of  iterations. 

containment  as  a  notion  of  conformance.  For  each  run,  we  measured  the  time  and 
number  of  iterations  required.  The  resulting  comparison  is  shown  in  Figures  6.3. 
These  results  indicate  that  simulation  leads  to  faster  (on  average  by  6.62  times) 
convergence  and  requires  fewer  (on  average  by  11.79  times)  iterations. 

Recall  also  that  during  the  refinement  step  we  use  a  Counterexample  Tree  to 
create  a  more  refined  predicate  abstraction  of  some  component.  It  is  possible  that  in 
each  iteration  of  the  CEGAR  loop,  we  generate  not  one  but  a  set  of  Counterexample 
Trees  CW.  Using  our  benchmarks,  we  investigated  the  effect  of  increasing  the  size  of 
CW.  In  particular,  the  measurements  for  total  time  were  obtained  as  follows.  The 
size  of  CW  was  varied  from  1  to  15  and  for  each  value  of  |CW|,  the  time  required 
to  check  simulation  as  well  as  trace  containment  for  each  benchmark  was  measured. 
Finally  the  geometric  mean  of  these  measurements  was  taken.  The  measurements  for 
iterations  and  memory  were  obtained  in  a  similar  fashion. 

The  graphs  in  Figure  6.4  summarize  the  results  we  obtained.  The  figures  indicate 
that  it  makes  sense  to  refine  on  multiple  counterexamples.  We  note  that  there  is 
consistent  improvement  in  all  three  metrics  up  to  |  C!W\  =  4.  Increasing  |  CW\ 
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Figure  6.4:  Time,  iteration  and  memory  requirements  for  different  number  of  Counterexample 
Trees. 

beyond  four  appears  to  be  counterproductive.  This  supports  our  earlier  intuition  that 
manipulating  a  large  number  of  counterexamples  is  expensive  and  will  only  provide 
diminishing  returns  beyond  a  certain  threshold. 
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Chapter  7 


Predicate  Minimization 


As  we  have  already  seen,  predicate  abstraction  is  in  the  worst  case  exponential  (in  both 
time  and  memory  requirements)  in  the  number  of  predicates  involved.  Therefore,  a 
crucial  requirement  to  make  predicate  abstraction  effective  is  to  use  as  few  predicates 
as  possible.  Traditional  approaches  to  counterexample  guided  predicate  discovery 
analyze  each  new  spurious  counterexample  in  isolation  and  accumulate  predicates 
monotonically.  This  may  lead  to  larger  than  necessary  sets  of  predicates,  which  may 
result  in  an  inability  to  solve  certain  problem  instances. 

For  example,  consider  a  scenario  where  the  first  counterexample,  CW i,  can  be 
eliminated  by  either  predicate  {pi}  or  {p2} ,  and  the  predicate  discovery  process 
chooses  {pi}.  Now  the  CEGAR  algorithm  finds  another  counterexample  CW 2,  which 
can  only  be  eliminated  by  the  predicate  {p2}-  The  CEGAR  algorithm  now  proceeds 
with  the  set  of  predicates  {pi,P2},  although  {p2 }  by  itself  is  sufficient  to  eliminate 
both  CW  1  and  CW 2-  This  is  clearly  undesirable. 

In  Chapter  6  we  presented  a  naive  algorithm  AbsRefine  for  finding  a  minimal 
sufficient  predicate  set  from  a  given  set  of  candidate  predicates.  In  the  above 


119 


scenario,  AbsRefine  would  indeed  choose  {^2}  as  the  new  set  of  predicates  after 
encountering  both  CW  \  and  CW 2.  However  it  will  perform  redundant  computation 
involving  CW In  this  chapter  we  will  present  a  more  sophisticated  abstraction 
refinement  algorithm  called  AbsRefMin  which  avoids  the  redundant  computation 
without  sacrificing  the  minimality  of  the  result. 

We  have  implemented  AbsRefMin  in  the  MAGIC  tool.  Our  experimental  results 
show  that  AbsRefMin  can  significantly  reduce  the  number  of  predicates  and 
consequently  the  amount  of  memory  required  in  comparison  to  the  naive  AbsRefine 
algorithm  as  well  as  existing  tools  such  as  BLAST. 

7.1  Related  work 

Predicate  abstraction  was  introduced  by  Graf  and  Saidi  in  [63].  It  was  subsequently 
used  with  considerable  success  in  both  hardware  and  software  verification  [6,50,66]. 
The  notion  of  CEGAR  was  originally  introduced  by  Kurshan  [76]  (originally  termed 
localization)  for  model  checking  of  finite  state  models.  Both  the  abstraction  and 
refinement  techniques  for  such  systems,  as  employed  in  his  and  subsequent  [37,  38] 
research  efforts,  are  essentially  different  from  the  predicate  abstraction  approach 
we  follow.  For  instance,  abstraction  in  localization  reduction  is  done  by  non- 
deterministically  assigning  values  to  selected  sets  of  variables,  while  refinement 
corresponds  to  gradually  returning  to  the  original  definition  of  these  variables. 

More  recently  the  CEGAR  framework  has  also  been  successfully  adapted  for 
verifying  infinite  state  systems  [102],  and  in  particular  software  [7,66].  The  problem 
of  hireling  small  (but  not  minimal)  sets  of  predicates  has  also  been  investigated  in  the 
context  of  hardware  designs  in  [33] .  The  work  most  closely  related  to  ours,  however, 
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is  that  of  Clarke,  Gupta,  Kukula  and  Strichman  [40]  where  the  CEGAR  approach 
is  combined  with  integer  linear  programming  techniques  to  obtain  a  minimal  set  of 
variables  that  separate  sets  of  concrete  states  into  different  abstract  states. 

7.2  Pseudo-Boolean  Constraints 

A  pseudo-Boolean  (PB)  formula  is  of  the  form  ]C"=I  ct  ■  bi  ex  k,  where:  (i)  bi  is  a 
Boolean  variable,  (ii)  c;  is  a  rational  constant  for  1  <  %  <  n,  (iii)  k  is  a  rational 
constant  and  (iv)  ix  represents  one  of  the  inequality  or  equality  relations  from  the 
set  {<,<,>,>,=}.  Each  such  constraint  can  be  expanded  to  a  CNF  formula  (hence 
the  name  pseudo-Boolean).  Hence  a  naive  way  to  solve  for  the  satisfiability  of  a 
PB  formula  is  to  translate  it  to  a  CNF  formula  and  then  use  a  standard  CNF  SAT 
solver  [87, 104-106,114], 

Example  17  A  typical  PB  formula  is  4>i  =  (2/3)x  +  5y  —  (3/4 )z  <  6.  This  formula 
is  equivalent  to  the  much  simpler  f>2  =  8x  +  60 y  —  9^  <  72.  Now  we  can  convert  <f>2 
into  a  purely  propositional  form.  A  common  way  to  do  this  is  to  assume  each  variable 
appearing  in  (j)2  to  be  a  bit-vector  of  some  fixed  width  w.  We  can  then  encode  each 
variable  using  w  Boolean  propositions  and  use  standard  Boolean  encodings  for  the 
arithmetic  and  relational  operators  appearing  in  <f>2  to  complete  the  transformation. 

However  the  expanded  CNF  form  of  a  PB  formula  p  can  be  exponential  in  the 
size  of  ip.  The  Pseudo-Boolean  Solver  (PBS)  [2]  does  not  perform  this  expansion,  but 
rather  uses  an  algorithm  designed  in  the  spirit  of  the  Davis-Putnam- Loveland  [51,52] 
algorithm  that  handles  these  constraints  directly.  PBS  accepts  as  input  standard 
CNF  formulas  augmented  with  pseudo-Boolean  constraints.  Given  a  standard  CNF 
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formula  0  and  an  objective  function  ip,  PBS  finds  an  optimal  solution  sopt  for  0.  The 
objective  function  ip  is  usually  an  arithmetic  expression  over  the  variables  of  0  treated 
as  having  a  value  of  either  zero  or  one.  The  result  sopt  is  optimal  in  the  sense  that  it 
minimizes  the  value  of  ip.  PBS  achieves  this  by  repeatedly  tightening  the  constraint 
over  the  value  of  ip  until  0  becomes  unsatisfiable. 

More  precisely,  PBS  first  finds  a  satisfying  solution  s  to  0  and  calculates  the  value 
of  ip  according  to  s.  Let  this  value  be  ip{s).  PBS  then  updates  0  by  adding  a  constraint 
that  the  value  of  ip  should  be  less  than  <p(s)  and  re-solves  for  the  satisfiability  of  0. 
This  process  is  repeated  until  0  becomes  unsatisfiable.  The  output  sopt  is  then  the 
last  satisfying  solution  for  0.  Note  that  a  possible  improvement  on  this  process  is 
to  perform  a  binary  (rather  than  a  linear)  search  over  the  value  of  ip.  However,  the 
performance  of  PBS  was  not  a  bottleneck  in  any  of  our  experiments. 

7.3  Predicate  Minimization 

We  now  describe  the  algorithm  AbsRefMin  (presented  in  Procedure  7.1)  for 
computing  a  refined  abstraction  based  on  a  minimal  set  of  branches  that  can  eliminate 
a  set  of  spurious  Counterexample  Witness  projections.  Essentially  AbsRefMin 
works  as  follows.  Let  Be  =  {bi, . . .  ,bk}  be  the  set  of  branches  of  the  component 
C  as  usual.  Suppose  we  introduce  Boolean  variables  BV  =  {ui, . . .  ,Vk},  where  each 
Vi  corresponds  to  the  branch  bt .  The  arguments  to  AbsRefMin  are:  (i)  a  spurious 
Counterexample  Witness  projection  CW,  (ii)  a  component  C,  (iii)  a  Boolean  formula 
0  over  BV,  and  (iv)  a  context  7  for  C. 

Intuitively,  0  captures  the  information  about  the  sets  of  branches  which 
can  eliminate  all  the  spurious  Counterexample  Witness  projections  encountered 
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previously.  More  precisely,  suppose  CW'  is  a  spurious  Counterexample  Witness 
projection  seen  before.  Suppose  that  the  set  of  sets  of  branches  which  can  eliminate 
CW'  are  {B i, . . . ,  Bi }  and  for  1  <  f  <  /,  let  Bj  =  {6i;1, . . . ,  bijni}.  Let  us  then  define 
the  formula  4>cw'  as  follows: 

<Pcw'  =  V  (  A  Vij 

1  <i<l  \l<j<nt 

Note  that  since  the  elements  of  each  Bi  are  branches  of  C ,  the  Boolean  variables 
in  (pew 1  are  from  the  set  B  V  described  earlier.  Now  consider  any  satisfying  solution  s 
to  4>cw'  such  that  s  assigns  the  variables  {it, . . . ,  vm}  to  true.  It  should  be  obvious 
from  the  above  definition  that  the  set  of  branches  {61, . . .  ,bm}  suffices  to  eliminate 
CW'.  Now  suppose  that  the  set  of  all  spurious  Counterexample  Witness  projections 
seen  previously  is  CW.  Then  the  formula  cj)  is  defined  as: 

(p  =  /\  (ficw 

cw’ecw 

Now  consider  any  satisfying  solution  s  to  (p  such  that  s  assigns  the  variables 
{vi, . . .  ,vm}  to  true.  Once  again,  it  should  be  obvious  from  the  above  definition 
that  the  set  of  branches  bm}  suffices  to  eliminate  every  element  of  CW. 

Example  18  Suppose  C  has  four  branches  {61,62,63,64}.  Hence  there  are  four 
Boolean  variables  {ui,  i>2,  V3,  V4}.  Consider  a  Counterexample  Witness  projection 
CWi  such  that  CWi  is  eliminated  by  either  branch  b\  alone  or  by  branches  62,63 
and  64  together.  Hence  the  Boolean  formula  (pcwx  is  (ui)  V(u2  A  V3  A  V4). 

Again  consider  another  Counterexample  Witness  projection  CW 2  such  that  CW 2 
is  eliminated  by  either  branch  64  alone  or  by  branches  61?62  and  63  together.  Hence 
the  Boolean  formula  4>cw2  is  (w)  V(ffi  A  u2  A  u3). 


123 


Let  the  set  of  Counterexample  Witness  projections  under  consideration  be  CW  = 
{CW i,  CW2}.  Then  the  formula  f  =  fcwi  A  fcw2  —  ((ih)  V  (v2  A  v3  Avf))  A((t4)  V 
Oi  A  v2  A  v3)). 

There  are  many  satisfying  solutions  to  f.  For  instance  one  solution  is 
(i>i  =  false,  v2  =  true,u3  =  TRUE,  v4  =  true).  This  means  that  the  set  of  branches 
{62,63,^4}  is  sufficient  to  eliminate  both  CW  4  and  CW 2. 

Yet  another  solution  is  (i>4  =  true,  v2  =  false,  v3  =  false,  v4  =  true) 
indicating  that  the  set  of  branches  {61,64}  is  also  sufficient  to  eliminate  both 
CW  1  and  CW2.  In  fact  this  is  a  minimal  such  set. 

Algorithm  AbsRefMin  first  updates  f  by  adding  clauses  corresponding  to  the 
new  spurious  Counterexample  Witness  projection  CW.  It  then  solves  for  the 
satisfiability  of  f  along  with  the  pseudo-Boolean  constraint  ip  =  Yff=lVi.  Since  the 
solution  satisfies  f  and  minimizes  93,  it  clearly  corresponds  to  a  minimal  set  of  branches 
which  can  eliminate  all  previous  spurious  Counterexample  Witness  projections  as  well 
as  CW. 

Note  that  <f>  is  updated  in  place  so  that  the  next  invocation  of  AbsRefMin  uses 
the  updated  (f>.  Also  note  that  if  f  is  unsat isfiable,  it  means  that  there  is  at  least  one 
spurious  Counterexample  Witness  projection  which  cannot  be  eliminated  by  any  set 
of  branches.  In  other  words,  there  is  some  CW  for  which  few  is  equivalent  to  FALSE. 
In  this  case  AbsRefMin  returns  ERROR. 

In  order  to  compute  few ,  AbsRefMin  naively  iterates  over  every  subset  of  Be- 
However  this  results  in  its  complexity  being  exponential  in  the  size  of  Be-  Below  we 
list  several  ways  to  reduce  the  number  of  subsets  attempted  by  AbsRefMin: 

•  Limit  the  cardinality  or  number  of  attempted  subsets  to  a  small  constant,  e.g. 
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Procedure  7.1  AbsRefMin  returns  a  refined  abstraction  for  C  based  on  a  minimal  set  of 
branches  that  eliminates  a  set  of  spurious  Counterexample  Witness  projections  and  ERROR  on 
failure.  The  parameter  0  initially  expresses  constraints  about  branches  which  can  eliminate  all 
previous  spurious  Counterexample  Witness  projections.  AbsRefMin  also  updates  0  with  the 
constraints  for  the  new  spurious  Counterexample  Witness  projection  CW. 

Algorithm  AbsRefMin(  CW,  C,  0,  7) 

-  CW  :  is  a  spurious  Counterexample  Witness,  0  :  is  a  Boolean  formula 

-  C  :  is  a  component,  7  :  is  a  context  for  C 
let  CW  =  ( S ,  I  nit ,  AP,  L,  E,  T); 

4>CW  :=  FALSE; 

for  each  B  C  Be  1 1  Be  is  the  set  of  branches  in  C 

fl  :=  PredInfer(C,  7,  B)-,  / /II  is  set  of  predicates  inferred  from  B 

let  M  :  =  |C1?  =  (s,  Imt ,  AP,  L ,  E,  t)  ; 

/ / M  is  the  predicate  abstraction  of  C  using  II 
if  (AbsCanSimul(CW,  I  nit,  M )  fl  I  nit  =  0)  then  <fcw  '■=  4>cw  V  f\tHeB 
/ / CW  ^  M,  and  hence  B  can  eliminate  CW 
0  :=  0  A  (few]  / /update  0  with  the  constraints  for  CW 

invoke  PBS  to  solve  (0,  E f=1Uj); 

if  0  is  unsatisfiable  then  return  ERROR;  / /no  set  of  branches  can  eliminate  CW 
else  let  sopt  =  solution  returned  by  PBS; 

let  {t’i, . . . ,  vm}  =  variables  assigned  TRUE  by  sopt  and  B  =  {61, . . . ,  bm}; 
fl  :=  PredInfer(C,  7 ,B)\  //fl  is  the  set  of  predicates  inferred  from  B 
return  |C|^;  / /return  the  predicate  abstraction  of  C  using  II 


5,  assuming  that  most  Counterexample  Witness  projections  can  be  eliminated 
by  a  small  set  of  branches. 

•  Stop  after  reaching  a  certain  size  of  subsets  if  any  eliminating  solutions  have 
been  found. 

•  Break  up  the  control  flow  graph  of  C  into  blocks  and  only  consider  subsets  of 
branches  within  blocks  (keeping  subsets  in  other  blocks  fixed). 

•  Use  data  flow  analysis  to  only  consider  subsets  of  related  branches. 
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•  For  any  CW,  if  a  set  p  eliminates  CW,  ignore  all  supersets  of  p  with  respect 
to  CW  (as  we  are  seeking  a  minimal  solution). 

In  our  experiments  we  used  some  of  the  above  techniques.  Details  are  presented  in 
Section  7.5.  In  conclusion,  we  note  that  other  techniques  for  solving  this  optimization 
problem  are  also  possible,  including  minimal  hitting  sets  and  logic  minimization.  The 
PBS  step,  however,  has  not  been  a  bottleneck  in  any  of  our  experiments. 

7.4  CEGAR  with  Predicate  Minimization 

In  this  section  we  present  the  complete  CEGAR  algorithm  for  simulation  conformance 
that  uses  AbsRefMin  instead  of  AbsRefine  for  abstraction  refinement.  The 
algorithm  is  called  SimulCEGARMin  and  is  presented  in  Procedure  7.2.  It  is 
similar  to  SimulCEGAR  except  that  instead  of  maintaining  the  set  of  spurious 
Counterexample  Witness  projections  CW \  for  each  component  C*  it  maintains  the 
formula  </>j.  The  proof  of  its  correctness  is  also  similar  to  that  of  SimulCEGAR. 

7.5  Experimental  Results 

We  implemented  our  technique  inside  MAGIC  and  experimented  with  a  variety  of 
benchmarks.  Each  benchmark  consisted  of  an  implementation  (a  C  program)  and  a 
specification  (provided  separately  as  an  LKS).  All  of  the  experiments  were  carried  out 
on  an  AMD  Athlon  1600  XP  machine  with  900  MB  RAM  running  RedHat  7.1.  In  this 
section  we  describe  our  results  in  the  context  of  checking  the  effectiveness  of  predicate 
minimization.  We  also  present  results  comparing  our  predicate  minimization  scheme 
with  a  greedy  predicate  minimization  strategy  implemented  on  top  of  magic. 
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Procedure  7.2  SimulCEGARMin  checks  simulation  conformance  between  a  program  V 
and  a  specification  Sp  in  a  context  T. 

Algorithm  SimulCEGARMin^,  Sp,  T) 

-  V  :  is  a  program,  T  :  is  a  context  for  V 

-  Sp  :  is  a  specification  LKS 

let  V  =  (Ci, , Cn)  and  Y  =  (71, . . . ,  7 „); 

for  each  %  e  {1, . . . ,  n} 

If,;  :=  PredInfer(Ci,7j,0)  and  Mi  :=  and  (pt  :=  TRUE; 

//Mi  =  initial  predicate  abstractions  of  Ci  with  empty  set  of  seed  branches 

forever  do 

let  M  =  Mi  ||  •  •  •  ||  Mn ; 

/ /M  is  the  composition  of  predicate  abstractions 
if  (SimulWitness(M,  Sp)  =  “ M  =4  Sp")  return  UV  =4  Sp 
/ /if  the  property  holds  on  M  it  also  holds  on  V 
let  CW  =  Counterexample  Witness  returned  by  Simul Witness; 
find  i  e  {1, . . .  ,n}  such  that  -iWeakSimul( CW  J  7j,Cj,7j); 

/ / check  compositionally  if  CW  is  spurious 
if  (no  such  i  found)  return  “V  4  Sp"- 

/ /  CW  is  valid  and  hence  V  is  not  simulated  by  Sp 
if  (AbsRefMin( CW  J  7j,C*,0j,7j)  =  error)  return  ERROR; 

/ /no  set  of  branches  can  eliminate  CW  \  7 j 
Mi  :=  AbsRefMin(CW  J  7,,  Cj,  0,,  7^);  // refine  the  abstraction  and  repeat 


7.5.1  The  Greedy  Approach 

In  each  iteration,  this  greedy  strategy  first  adds  predicates  sufficient  to  eliminate  the 
spurious  Counterexample  Witness  to  its  candidate  branch  set  B.  Then  it  attempts 
to  reduce  the  size  of  the  resulting  B  by  using  the  algorithm  GreedyMin  described 
in  Procedure  7.3.  The  advantage  of  this  approach  is  that  it  requires  only  a  small 
overhead  (polynomial)  compared  to  AbsRefMin,  but  on  the  other  hand  it  does  not 
guarantee  an  optimal  result.  Further,  we  performed  experiments  with  the  blast  [66] 
tool,  blast  also  takes  C  programs  as  input,  and  uses  a  variation  of  the  standard 
CEGAR  loop  based  on  lazy  abstraction,  but  without  minimization.  Lazy  abstraction 
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Procedure  7.3  GreedyMin  returns  a  greedily  computed  refined  abstraction  for  C  that 
eliminates  every  spurious  Counterexample  Witness  projection  in  CW . 

Algorithm  GreedyMin(CW,  C,  7) 

-  CW  :  is  a  set  of  Counterexample  Witnesses 

-  C  :  is  a  component,  7  :  is  a  context  for  C 

B  :=  Be ;  j  /  start  with  the  set  of  all  branches  in  C 

/ /  repeatedly  try  to  remove  elements  from  B  while  maintaining  the  invariant  that 
/ /  every  spurious  Counterexample  Witness  projection  in  CW  can  still  be  eliminated 

Loop: 

Create  random  ordering  {bi, ...  ,bk}  of  B ; 
for  i  —  1  to  k 

B'  :=  B\  {&;};  / / check  if  bi  is  redundant 

if  B'  can  eliminate  all  elements  of  CW 

B  :=  B'\  / /bi  is  redundant  and  can  be  eliminated 

goto  Loop; 

II  :=  PredInfer(C,  7,  B);  / /II  is  set  of  predicates  inferred  from  B 

return  |C|^;  // return  the  predicate  abstraction  of  C  using  II 


refines  an  abstract  model  while  allowing  different  degrees  of  abstraction  in  different 
parts  of  a  program,  without  requiring  recomputation  of  the  entire  abstract  model  in 
each  iteration.  Laziness  and  predicate  minimization  are,  for  the  most  part,  orthogonal 
techniques.  In  principle  a  combination  of  the  two  might  produce  better  results  than 
either  in  isolation. 

7.5.2  Benchmarks 

We  used  two  kinds  of  benchmarks.  A  small  set  of  relatively  simple  benchmarks  were 
derived  from  the  examples  supplied  with  the  BLAST  distribution  and  regression  tests 
for  MAGIC.  The  difficult  benchmarks  were  derived  from  the  C  source  code  of  OpenSSL. 
A  critical  component  of  this  protocol  is  the  initial  handshake  between  a  server  and 
a  client.  We  verified  different  properties  of  the  main  routines  that  implement  the 


128 


|  magic  +  GREEDY 

magic  +  MINIMIZE  | 

Program 

Time 

Iter 

Pred 

Mem 

Time 

Iter 

Pred 

Mem 

funcall-nested 

6 

2 

10/9/1 

X 

5 

2 

10/9/1 

X 

fun_lock 

5 

5 

8/3/3 

X 

6 

4 

8/3/3 

X 

driver.c 

5 

5 

6/2/4 

X 

5 

5 

6/2/4 

X 

read.c 

6 

3 

15/5/1 

X 

5 

2 

15/5/1 

X 

socket-y-01 

5 

3 

12/4/2 

X 

6 

3 

12/4/2 

X 

opttest.c 

150 

5 

4/4/4 

63 

247 

25 

4/4/4 

63 

ssl-srvr- 1 

* 

103 

16/3/5 

51 

226 

14 

5/4/2 

38 

ssl-srvr-2 

2106 

62 

8/4/3 

34 

216 

14 

5/4/2 

38 

ssl-srvr-3 

* 

100 

22/3/7 

53 

200 

12 

5/4/2 

38 

ssl-srvr-4 

8465 

69 

14/4/5 

56 

170 

9 

5/4/2 

38 

ssl-srvr-5 

* 

117 

23/5/9 

56 

205 

13 

5/4/2 

36 

ssl-srvr-6 

* 

84 

22/4/8 

337 

359 

14 

8/4/3 

89 

ssl-srvr-7 

* 

99 

19/3/6 

62 

196 

11 

5/4/2  S 

38 

ssl-srvr-8 

* 

97 

19/4/7 

142 

211 

10 

8/4/3 

40 

ssl-srvr-9 

8133 

99 

11/4/4 

69 

316 

20 

11/4/4 

38 

ssl-srvr- 10 

* 

97 

12/3/4 

77 

241 

14 

8/4/3 

38 

ssl-srvr- 11 

* 

87 

26/4/9 

65 

356 

24 

8/4/3 

38 

ssl-srvr- 12 

* 

122 

23/4/8 

180 

301 

17 

8/4/3 

42 

ssl-srvr- 13 

* 

106 

19/4/7 

69 

436 

29 

11/4/4 

38 

ssl-srvr- 14 

* 

115 

18/3/6 

254 

406 

20 

8/4/3 

52 

ssl-srvr- 15 

2112 

37 

8/4/3 

118 

179 

7 

8/4/3 

40 

ssl-srvr- 16 

* 

103 

22/3/7 

405 

356 

17 

8/4/3 

58 

ssl-clnt- 1 

225 

27 

5/4/2 

20 

156 

12 

5/4/2 

31 

ssl-clnt- 2 

1393 

63 

5/4/2 

23 

185 

18 

5/4/2 

29 

ssl-clnt-3 

* 

136 

29/4/10 

28 

195 

21 

5/4/2 

29 

ssl-clnt-4 

152 

29 

5/4/2 

20 

191 

19 

5/4/2 

29 

TOTAL 

163163 

1775 

381/102 

/129 

2182 

5375 

356 

191/107 

/67 

880 

AVERAGE 

6276 

68 

15/4/5 

104 

207 

14 

7/4/3 

42 

Table  7.1:  Comparison  of  MAGIC  with  the  greedy  approach.  indicates  run-time  longer  than 
3  hours,  ‘x’  indicates  negligible  values.  Best  results  are  emphasized. 

handshake.  The  names  of  benchmarks  that  are  derived  from  the  server  routine  and 
client  routine  begin  with  ssl-srvr  and  ssl-clnt  respectively.  In  all  our  benchmarks, 
the  properties  are  satisfied  by  the  implementation.  Note  that  all  these  benchmarks 
involved  purely  sequential  C  code. 


7.5.3  Results  Summary 

Table  7.1  summarizes  the  comparison  of  our  predicate  minimization  strategy  with 
the  greedy  approach.  Time  consumptions  are  given  in  seconds.  For  predicate 
minimization,  instead  of  solving  the  full  optimization  problem,  we  simplified  the 
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Table  7.2:  Results  for  MAGIC  with  and  without  minimization.  indicates  run-time  longer 
than  3  hours,  ‘x’  indicates  negligible  values.  Best  results  are  emphasized. 

problem  as  described  in  section  7.3.  In  particular,  for  each  trace  we  only  considered 
the  first  1,000  combinations  and  only  generated  20  eliminating  combinations.  The 
combinations  were  considered  in  increasing  order  of  size.  After  all  combinations 
of  a  particular  size  had  been  tried,  we  checked  whether  at  least  one  eliminating 
combination  had  been  found.  If  so,  no  further  combinations  were  tried.  In  the  smaller 
examples  we  observed  no  loss  of  optimality  due  to  these  restrictions.  We  also  studied 
the  effect  of  altering  these  restrictions  on  the  larger  benchmarks  and  we  report  our 
findings  later. 

Table  7.2  shows  the  improvement  observed  in  magic  upon  using  predicate 
minimization  while  Table  7.3  shows  the  comparison  between  predicate  minimization 
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Table  7.3:  Results  for  blast  and  MAGIC  with  predicate  minimization.  indicates  run-time 
longer  than  3  hours,  ‘x’  indicates  negligible  values.  Best  results  are  emphasized. 

and  blast.  Once  again,  time  consumptions  are  reported  in  seconds.  The  column  Iter 
reports  the  number  of  iterations  through  the  CEGAR  loop  necessary  to  complete  the 
proof.  Predicates  are  listed  differently  for  the  two  tools.  For  blast,  the  first  number 
is  the  total  number  of  predicates  discovered  and  used  and  the  second  number  is  the 
number  of  predicates  active  at  any  one  point  in  the  program  (due  to  lazy  abstraction 
this  may  be  smaller).  In  order  to  force  termination  we  imposed  a  limit  of  three  hours 
on  the  running  time.  We  denote  by  in  the  Time  column  examples  that  could 
not  be  solved  in  this  time  limit.  In  these  cases  the  other  columns  indicate  relevant 
measurements  made  at  the  point  of  forceful  termination. 

For  MAGIC,  the  first  number  is  the  total  number  of  expressions  used  to  prove  the 
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property.  The  number  of  predicates  (the  second  number)  may  be  smaller,  as  magic 
combines  multiple  mutually  exclusive  expressions  (e.g.,  x  ==  1,  x  <  1,  and  x  >  1) 
into  a  single,  possibly  non-binary  predicate,  having  a  number  of  values  equal  to  the 
number  of  expressions  (plus  one,  if  the  expressions  do  not  cover  all  possibilities).  The 
final  number  for  MAGIC  is  the  size  of  the  final  set  of  branches.  For  experiments  in 
which  memory  usage  was  large  enough  to  be  a  measure  of  state-space  size  rather  than 
overhead,  we  also  report  memory  usage  (in  megabytes). 

For  the  smaller  benchmarks,  the  various  abstraction  refinement  strategies  do 
not  differ  markedly.  However,  for  our  larger  examples  derived  from  OpenSSL,  the 
refinement  strategy  is  of  considerable  importance.  Predicate  minimization,  in  general, 
reduced  verification  time  (though  there  were  a  few  exceptions  to  this  rule,  the  average 
running  time  was  considerably  lower  than  for  the  other  techniques,  even  with  the 
cutoff  on  the  running  time).  Moreover,  predicate  minimization  reduced  the  memory 
needed  for  verification,  which  is  an  even  more  important  bottleneck.  Given  that  the 
memory  was  cutoff  in  some  cases  for  other  techniques  before  verification  was  complete, 
the  results  are  even  more  compelling. 

The  greedy  approach  kept  memory  use  fairly  low,  but  almost  always  failed  to 
find  near-optimal  predicate  sets  and  converged  much  more  slowly  than  the  usual 
monotonic  refinement  or  predicate  minimization  approaches.  Further,  it  is  not  clear 
how  much  final  memory  usage  would  be  improved  by  the  greedy  strategy  if  it  were 
allowed  to  run  to  completion.  Another  major  drawback  of  the  greedy  approach  is  its 
unpredictability.  We  observed  that  on  any  particular  example,  the  greedy  strategy 
might  or  might  not  complete  within  the  time  limit  in  different  executions.  Clearly, 
the  order  in  which  this  strategy  tries  to  eliminate  predicates  in  each  iteration  is 
very  critical  to  its  success.  Given  that  the  strategy  performs  poorly  on  most  of  our 
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benchmarks  using  a  random  ordering,  more  sophisticated  ordering  techniques  may 
perform  better. 
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Table  7.4:  Results  for  optimality.  ELM  =  MAXELM,  SUB  =  MAXSUB,  Ti  =  Time  in  seconds, 
It  =  number  of  iterations,  Br  =  number  of  branches,  M  =  Memory,  T  =  total  number  of 
eliminating  subsets  generated,  and  G  =  maximum  size  of  any  eliminating  subset  generated. 


7.5.4  Optimality 

We  experimented  with  two  of  the  parameters  that  affect  the  optimality  of  our 
predicate  minimization  algorithm:  (i)  the  maximum  number  of  examined  subsets 
(MAXSUB)  and  (ii)  the  maximum  number  of  eliminating  subsets  generated  (MAXELM) 
(that  is,  the  procedure  stops  the  search  if  MAXELM  eliminating  subsets  were  found, 
even  if  less  than  MAXSUB  combinations  were  tried).  We  first  kept  MAXSUB  fixed  and 
took  measurements  for  different  values  of  MAXELM  on  a  subset  of  our  benchmarks 
viz.  ssl-srvr-4,  ssl-srvr-15  and  ssl-clnt-1.  Our  results,  shown  in  Table  7.4,  clearly  indicate 
that  the  optimality  is  practically  unaffected  by  the  value  of  MAXELM. 

Next  we  experimented  with  different  values  of  MAXSUB  (the  value  of  MAXELM 
was  set  equal  to  MAXSUB).  The  results  we  obtained  are  summarized  in  Table  7.5.  It 
appears  that,  at  least  for  our  benchmarks,  increasing  MAXSUB  leads  only  to  increased 
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Table  7.5:  Results  for  optimality.  SUB  =  MAXSUB,  Time  is  in  seconds,  It  =  number  of 
iterations,  Br  =  number  of  branches,  T  =  total  number  of  eliminating  subsets  generated,  M  = 
maximum  size  of  subsets  tried,  and  G  =  maximum  size  of  eliminating  subsets  generated. 

execution  time  without  reduced  memory  consumption  or  number  of  predicates.  The 
additional  number  of  combinations  attempted  or  constraints  allowed  does  not  lead 
to  improved  optimality.  The  most  probable  reason  is  that,  as  shown  by  our  results, 
even  though  we  are  trying  more  combinations,  the  actual  number  or  maximum  size  of 
eliminating  combinations  generated  does  not  increase  significantly.  Indeed,  if  this  is 
a  feature  of  most  real-life  programs,  it  would  allow  us,  in  most  cases,  to  achieve  near 
optimality  by  trying  out  only  a  small  number  of  combinations  or  only  combinations 
of  small  size. 
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Chapter  8 


State-Event  Temporal  Logic 


In  this  chapter,  we  present  an  expressive  linear  temporal  logic  called  SE-LTL  for 
specifying  state-event  based  properties.  We  also  discuss  a  compositional  CEGAR 
framework  for  verifying  concurrent  software  systems  against  SE-LTL  formulas. 
Control  systems  ranging  from  smart  cards  to  automated  flight  controllers  are 
increasingly  being  incorporated  within  complex  software  systems.  In  many  instances, 
errors  in  such  systems  can  have  catastrophic  consequences,  hence  the  urgent  need  to 
be  able  to  ensure  and  guarantee  their  correctness.  In  this  endeavor,  the  well-known 
methodology  of  model  checking  [32,35,39,99]  holds  much  promise.  Although  most 
of  its  early  applications  dealt  with  hardware  and  communication  protocols,  model 
checking  is  increasingly  being  used  to  verify  software  systems  [5,6,13,23,24,44,65, 
66,98,107,111,112]  as  well. 

Unfortunately,  applying  model  checking  to  software  is  complicated  by  several 
factors,  ranging  from  the  difficulty  to  model  computer  programs — due  to  the 
complexity  of  programming  languages  as  compared  to  hardware  description 
languages — to  difficulties  in  specifying  meaningful  properties  of  software  using  the 
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usual  temporal  logical  formalisms  of  model  checking.  A  third  reason  is  the  state  space 
explosion  problem,  whereby  the  complexity  of  verifying  an  implementation  against  a 
specification  becomes  prohibitive. 

The  most  common  instantiations  of  model  checking  to  date  have  focused  on  finite- 
state  models  and  either  branching-time  (CTL  [32])  or  linear  (LTL  [78])  temporal 
logics.  To  apply  model  checking  to  software,  it  is  necessary  to  specify  (often  complex) 
properties  on  the  finite-state  abstracted  models  of  computer  programs.  The  difficulties 
in  doing  so  are  even  more  pronounced  when  reasoning  about  modular  software, 
such  as  concurrent  or  component-based  sequential  programs.  Indeed,  in  modular 
programs,  communication  among  modules  proceeds  via  actions  (or  events),  which 
can  represent  function  calls,  requests  and  acknowledgments,  etc.  Moreover,  such 
communication  is  commonly  data-dependent.  Software  behavioral  claims,  therefore, 
are  often  specifications  defined  over  combinations  of  program  actions  and  data 
valuations. 

Existing  modeling  techniques  usually  represent  finite-state  machines  as  finite 
annotated  directed  graphs,  using  either  state-based  or  event-based  formalisms.  It  is 
well-known  that  theoretically  the  two  frameworks  are  interchangeable.  For  instance, 
an  action  can  be  encoded  as  a  change  in  a  state  variable,  and  likewise  one  can 
equip  a  state  with  different  actions  to  reflect  different  values  of  its  internal  variables. 
However,  converting  from  one  representation  to  the  other  often  leads  to  a  significant 
enlargement  of  the  state  space.  Moreover,  neither  approach  on  its  own  is  practical 
when  it  comes  to  modular  software,  in  which  actions  are  often  data-dependent: 
considerable  domain  expertise  is  then  required  to  annotate  the  program  and  to  specify 
proper  claims. 

This  chapter,  therefore,  presents  a  framework  in  which  both  state-based  and 
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action-based  properties  can  be  expressed,  combined,  and  verified.  The  modeling 
framework  consists  of  labeled  Kripke  structures  (LKS),  which  are  directed  graphs  in 
which  states  are  labeled  with  atomic  propositions  and  transitions  are  labeled  with 
actions.  The  specification  logic  is  a  state/event  derivative  of  LTL.  This  allows 
us  to  represent  both  software  implementations  and  specifications  directly  without 
any  program  annotations  or  privileged  insights  into  program  execution.  We  further 
show  that  an  efficient  model  checking  algorithm  can  be  applied  to  help  reason  about 
state/event-based  systems.  Significantly,  our  model  checking  algorithm  operates 
directly  on  the  LKS  models  and  is  thus  able  to  avoid  the  extra  cost  involved  in 
translating  state/event  systems  into  systems  that  are  based  purely  either  states  or 
events.  We  have  implemented  our  approach  within  the  C  verification  tool  magic  [23, 
24,  80],  and  report  promising  results  in  the  examples  which  we  have  tackled. 

The  state/event-based  formalism  presented  here  is  suitable  for  both  sequential  and 
concurrent  systems,  and  is  also  amenable  to  the  compositional  abstraction  refinement 
procedures  [23]  presented  earlier.  These  procedures  are  embedded  within  a  CEGAR 
framework  [37],  one  of  the  core  features  of  magic.  CEGAR  lets  us  investigate  the 
validity  of  a  given  specification  through  a  sequence  of  increasingly  refined  abstractions 
of  our  system,  until  the  property  is  either  established  or  a  real  counterexample 
is  found.  Moreover,  thanks  to  compositionality,  the  abstraction,  counterexample 
validation,  and  refinement  steps  can  all  be  carried  out  component- wise,  thereby 
alleviating  the  need  to  build  the  full  state  space  of  the  distributed  system. 

We  illustrate  our  state/event  paradigm  with  a  current  surge  protector  example, 
and  conduct  further  experiments  with  the  source  code  for  OpenSSL  and  yuC/OS-II  (a 
real-time  operating  system  for  embedded  applications).  In  the  case  of  the  latter,  we 
discovered  several  bugs,  one  of  which  was  unknown  to  the  developers  of  /iC/OS-II.  We 
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contrast  our  approach  with  equivalent  pure  state-based  and  event-based  alternatives, 
and  demonstrate  that  the  state/event  methodology  yields  significant  gains  in  state 
space  size  and  verification  time. 

This  chapter  is  organized  as  follows.  In  Section  8.1,  we  review  and  discuss  related 
work.  Section  8.2  presents  the  basic  definitions  and  results  needed  for  the  presentation 
of  our  compositional  CEGAR  verification  algorithm.  In  Section  8.3,  we  present 
our  state/event  specification  formalism,  based  on  linear  temporal  logic.  We  review 
standard  automata-theoretic  model  checking  techniques,  and  show  how  these  can  be 
adapted  to  the  verification  task  at  hand. 

In  Section  8.4,  we  illustrate  these  ideas  by  modeling  a  simple  surge  protector. 
We  also  contrast  our  approach  with  pure  state-based  and  event-based  alternatives, 
and  show  that  both  the  resulting  implementations  and  specifications  are  significantly 
more  cumbersome.  We  then  use  magic  to  check  these  specifications,  and  discover 
that  the  non-state/event  formalisms  incur  significant  time  and  space  penalties  during 
verification. 

Section  8.5  details  our  SE-LTL  verification  algorithm  for  C  programs  while 
Section  8.6  presents  the  complete  compositional  CEGAR  scheme  for  SE-LTL 
verification  of  concurrent  C  programs.  Finally,  in  Section  8.7,  we  report  on  case 
studies  in  which  we  checked  specifications  on  the  source  code  for  OpenSSL  and 
/uC/OS-II,  which  led  us  to  the  discovery  of  a  bug  in  the  latter. 

8.1  Related  Work 

Counterexample  guided  abstraction  refinement  [37,76],  or  CEGAR,  is  an  iterative 
procedure  whereby  spurious  counterexamples  to  a  specification  are  repeatedly 
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eliminated  through  incremental  refinements  of  a  conservative  abstraction  of  the 
system.  CEGAR  has  been  used,  among  others,  in  [90]  (in  non-automated  form), 
and  [6,23,29,40,66,77,98], 

Compositionality,  which  features  centrally  in  our  work,  is  broadly  concerned  with 
the  preservation  of  properties  under  substitution  of  components  in  concurrent  systems. 
It  has  been  extensively  studied,  among  others,  in  process  algebra  (e.g.,  [69, 85, 100]),  in 
temporal  logic  model  checking  [64],  and  in  the  form  of  assume-guarantee  reasoning  [41, 
67,84],  The  combination  of  CEGAR  and  compositional  reasoning  is  a  relatively 
new  approach.  In  [10],  a  compositional  framework  for  (non-automated)  CEGAR 
over  data-based  abstractions  is  presented.  This  approach  differs  from  ours  in  that 
communication  takes  place  through  shared  variables  (rather  than  blocking  message¬ 
passing),  and  abstractions  are  refined  by  eliminating  spurious  transitions,  rather  than 
by  splitting  abstract  states. 

The  idea  of  combining  state-based  and  event-based  formalisms  is  certainly  not 
new  [14],  De  Nicola  and  Vaandrager  [94],  for  instance,  introduce  doubly  labeled 
transition  systems,  which  are  very  similar  to  our  LKSs.  From  the  point  of  view 
of  expressiveness,  our  state/event  version  of  LTL  is  also  subsumed  by  the  modal  mu- 
calculus  [16,  74,  97],  via  a  translation  of  LTL  formulas  into  Biichi  automata.  However 
all  these  approaches  are  restricted  to  finite  state  systems. 

Kindler  and  Vesper  [73]  propose  a  state/event-based  temporal  logic  for  Petri 
nets.  They  motivate  their  approach  by  arguing,  as  we  do,  that  pure  state-based 
or  event-based  formalisms  lack  expressiveness  in  important  respects.  Huth  et.  al.  [72] 
also  propose  a  state/event  framework,  and  define  rich  notions  of  abstraction  and 
refinement.  In  addition,  they  provide  may  and  must  modalities  for  transitions,  and 
show  how  to  perform  efficient  three-valued  verification  on  such  structures.  They  do 


139 


not,  however,  provide  an  automated  CEGAR  framework,  and  it  is  not  clear  whether 
they  have  implemented  and  tested  their  approach.  Giannakopoulou  and  Magee  [62] 
define  fluent  propositions  within  a  labeled  transition  system  (LTS  -  essentially  an  LKS 
with  an  empty  set  of  atomic  propositions)  context  to  express  action-based  linear-time 
properties.  A  fluent  proposition  is  a  property  that  holds  after  it  is  initiated  by  an 
action  and  ceases  to  hold  when  terminated  by  another  action.  This  work  exploits 
partial-order  reduction  techniques  and  has  been  implemented  in  the  ltsa  tool. 

In  a  comparatively  early  paper,  De  Nicola  et.  al.  [93]  propose  a  process 
algebraic  framework  with  an  action-based  version  of  CTL  as  specification  formalism. 
Verification  then  proceeds  by  first  translating  the  underlying  LTSs  of  processes  into 
Kripke  structures  and  the  action-based  CTL  specifications  into  equivalent  state-based 
CTL  formulas.  At  that  point,  a  model  checker  is  used  to  establish  or  refute  the 
property.  Dill  [55,56]  defines  trace  structures  as  algebraic  objects  to  model  both 
hardware  circuits  and  their  specifications.  Trace  structures  can  handle  equally  well 
states  or  events,  although  usually  not  both  at  the  same  time.  Dill’s  approach  to 
verification  is  based  on  abstractions  and  compositional  reasoning,  albeit  without  an 
iterative  counterexample-driven  refinement  loop. 

In  general,  events  (input  signals)  in  circuits  can  be  encoded  via  changes  in  state 
variables.  Browne  makes  use  of  this  idea  in  [18],  which  features  a  CTL*  specification 
formalism.  Browne’s  framework  also  features  abstractions  and  compositional 
reasoning,  in  a  manner  similar  to  Dill’s.  Burch  [20]  extends  the  idea  of  trace 
structures  into  a  full-blown  theory  of  trace  algebra.  The  focus  here  however  is  the 
modeling  of  discrete  and  continuous  time,  and  the  relationship  between  these  two 
paradigms.  This  work  also  exploits  abstractions  and  compositionality,  however  once 
again  without  automated  counterexample  guided  refinements.  Finally,  Bultan  [19] 
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proposes  an  intermediate  specification  language  lying  between  high-level  Statechart¬ 
like  formalisms  and  transition  systems.  Actions  are  encoded  as  changes  in  state 
variables  in  a  framework  which  also  focuses  on  exploiting  compositionality  in  model 
checking. 


8.2  Preliminaries 

Recall,  from  Definition  1,  that  an  LKS  is  a  6-tuple  (S,  I  nit,  AP,  L,E,  T).  In  this 
section  we  present  a  few  preliminary  definitions  that  will  be  used  in  the  rest  of  the 
chapter. 

Definition  20  (Infinite  Path  and  Trace)  Let  M  be  an  LKS.  An  infinite  path  of 
M  is  an  infinite  sequence  (s0,  a0,  si,  ay, . . . )  such  that:  (i)  s0  G  InitM  and  (ii) 
Vi  >  O.Sj  —^m  Sj+i-  Li  such  a  case,  the  infinite  sequence  ( LM(s0 ),  a0)  LM(sf),  ay, . . . ) 
is  called  an  infinite  trace  of  M. 

In  the  rest  of  this  chapter  we  will  only  restrict  our  attention  to  infinite  paths  and 
traces.  We  will  also  assume  that  the  transition  relation  of  every  LKS  is  total,  i.e., 
every  state  has  at  least  one  outgoing  transition.  The  notion  of  paths  leads  naturally 
to  that  of  languages. 

Definition  21  (Language)  Let  M  be  an  LKS.  The  language  of  M ,  denoted  by 
jC(M),  is  defined  as:  C(M)  =  {n  \^ t  is  a  path  of  M}. 
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8.3  The  Logic  SE-LTL 


We  now  present  a  logic  enabling  us  to  refer  easily  to  both  states  and  events  when 
constructing  specifications.  Given  an  LKS  M,  we  consider  linear  temporal  logic 
state/event  formulas  over  the  sets  AP  M  and  E M.  Suppose  p  ranges  over  APM  and  a 
ranges  over  E m-  Then  the  syntax  of  SE-LTL  can  be  defined  inductively  as  follows: 

0  p  |  a  \  -<0  \  0  A  0  j  X0  |  G0  |  F0  |  0  U  0 

We  write  SE-LTL  to  denote  the  resulting  logic,  and  in  particular  to  distinguish  it 
from  (standard)  LTL.  Let  n  =  (so,  si,  «i,  •  ••)  be  a  path  of  M.  For  i  >  0.  let  nl 
denote  the  suffix  of  n  starting  in  state  st.  We  then  inductively  define  path-satisfaction 
of  SE-LTL  formulas  as  follows: 

7r|=p  iff  p  e  Lm(s0) 

7t  |=  a  iff  a  =  a o 

7T  (=  “I (j)  iff  7T  \f=  0 

7i  |=  (fi  A  02  iff  tt  |=  0i  and  7T  [ =  02 
7T  |=  X0  iff  7T1  |=  0 

7T  |=  G0  iff  Vi  ^  0 . 7Tl  \=  0 

7T  |=  F0  iff  di  ^  0 . 7T*  |=  0 

7T  [=  0i  U  02  iff  3i  ^  0 . 7r*  |=  02  and  VO  ^  j  <  i .  7T7  |=  0i 

We  then  let  M  |=  0  iff,  for  every  path  7r  G  £(M),  tt  |=  0.  We  also  use  the  derived 

(weak  until)  W  operator:  0X  W  02  =  (G0i)  V  (0i  U  02).  As  a  simple  example, 
consider  the  following  LKS  M .  It  has  two  states,  the  leftmost  of  which  is  the  sole 
initial  state.  Its  set  of  atomic  state  propositions  is  {p,  q,  r};  the  first  state  is  labeled 
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with  {p,  q}  and  the  second  with  {q,  r}.  M’s  transitions  are  similarly  labeled  with  sets 
of  events  drawn  from  the  alphabet  {a,  b,  c,  d}. 


a,b 


As  the  reader  may  easily  verify,  M  \=  G(c  =>•  Fr)  but  M  G(b  ==>-  Fr). 
Note  also  that  M  \=  G (d,  ==>•  Fr),  but  M  G (d  ==>•  XFr). 

8.3.1  Automata-based  Verification 

We  aim  to  reduce  SE-LTL  verification  problems  to  standard  automata-theoretic 
techniques  for  LTL.  Note  that  a  standard — but  unsatisfactory — way  of  achieving  this 
is  to  explicitly  encode  actions  through  changes  in  (additional)  state  variables,  and 
then  proceed  with  LTL  verification.  Unfortunately,  this  technique  usually  leads  to  a 
significant  blow-up  in  the  state  space,  and  consequently  yields  much  larger  verification 
times.  The  approach  we  present  here,  on  the  other  hand,  does  not  alter  the  size  of  the 
LKS,  and  is  therefore  considerably  more  efficient.  We  first  recall  some  basic  results 
about  LTL,  Kripke  structures,  and  automata-based  verification. 

A  Kripke  structure  is  simply  an  LKS  minus  the  alphabet  and  the  transition- 
labeling  function;  as  for  LKSs,  the  transition  relation  of  a  Kripke  structure  is  required 
to  be  total.  An  LTL  formula  is  an  SE-LTL  formula  which  makes  no  use  of  events  as 
atomic  propositions. 

Definition  22  (Kripke  Structure)  A  Kripke  Structure  (KS  for  short)  is  a  5-tuple 
(S',  I  nit,  AP ,  L,  T)  where:  (i)  S  is  a  non-empty  set  of  states,  (ii)  Init  C  S  is  a  set 
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of  initial  states,  (in)  AP  is  a  set  of  atomic  propositions,  (iv)  L  :  S  — >  2AP  is  a 
propositional  labeling  function  that  maps  every  state  to  a  set  of  atomic  propositions 
that  are  true  in  that  state,  and  (v)  T  C  S  x  S  is  a  total  transition  relation. 

The  notion  of  paths,  traces  and  languages  for  Kripke  Structures  is  analogous  to 
those  of  Labeled  Kripke  Structures. 

Definition  23  (Path  and  Trace)  Let  M  be  a  KS.  A  path  of  M  is  an  infinite 
sequence  (sq,s\,...)  such  that:  (i)  so  G  InitM  and  (ii)  Vi  >  0,  s; — i-  In 
such  a  case,  the  infinite  sequence  ( LM(s0 ),  LM(sf), . . . )  is  called  a  trace  of  M . 

Definition  24  (Language)  Let  M  be  a  KS.  The  language  of  M ,  denoted  by  C(M), 
is  defined  as:  C(M)  =  {n  \  n  is  a  path  of  M}. 

8.3.2  The  Logic  LTL 

Given  a  KS  M,  we  consider  linear  temporal  logic  (LTL)  formulas  over  the  set  APm- 
Suppose  p  ranges  over  APM.  Then  the  syntax  of  LTL  can  be  defined  inductively  as 


follows: 


Let  7i  —  (.So,  Si,  -  •  •)  be  a  path  of  M.  For  i  >  0,  let  7r*  denote  the  suffix  of  n  starting 
in  state  s*.  We  then  inductively  define  path-satisfaction  of  LTL  formulas  as  follows: 

7T  |=  p  iff  p  G  Lm(s0) 

7 r  (=  -10  iff  7T  (A  0 

7T  |=  01  A  02  iff  7T  |=  01  and  7T  |=  02 
7T  |=  X0  iff  7T1  (=  0 

7T  |=  G0  iff  Vf  ^  0 . 7T*  |=  0 

7T  |=  F0  iff  di  ^  0 . 7T*  |=  0 

7r  |=  0i  U  02  iff  3i  ^  0 . 7T*  |=  02  and  VO  ^  j  <  i .  70  |=  0i 

We  then  let  M  |=  0  iff,  for  every  path  7r  G  C(M),  n  |=  0.  We  also  use  the  derived 

(weak  until)  W  operator:  0i  W  02  =  (G0i)  V  (0i  U  02).  We  now  define  a  Biichi 
automaton  formally. 

Definition  25  (Biichi  Automaton)  A  Biichi  Automaton  (BA  for  short)  is  a  6- 
tuple  (S,  I  nit,  AP ,  L,  T,  Acc)  where:  (i)  S  is  a  finite  set  of  states,  (ii)  Init  C  S 
is  a  set  of  initial  states,  (in)  AP  is  a  finite  set  of  atomic  state  propositions,  (iv) 
L  :  S  — >  22AP  is  a  state- labeling  function,  (v)  T  C  S  x  S  is  a  transition  relation,  and 
(vi)  Acc  C  S  is  a  set  of  accepting  states. 

Note  that  the  transition  relation  is  not  required  to  be  total,  and  is  moreover 
unlabeled.  Note  also  that  the  states  of  a  Biichi  automaton  are  labeled  with  arbitrary 
functions  over  the  set  of  atomic  propositions  AP .  The  idea  is  that  each  state  of  the 
BA  corresponds  to  a  set  of  valuations  to  AP.  For  example,  suppose  AP  =  {pi,P2,P3} 
and  suppose  that  a  state  s  of  the  BA  is  labeled  with  the  function  pi  A  ~>P2-  Then 
the  state  s  can  match  any  state  of  a  Kripke  structure  where  p\  is  true  and  p2  is 


145 


false  (the  value  of  p 3  can  be  either  true  or  false).  Thus  a  labeling  of  the  state  of 
a  BA  is  a  constraint  and  not  an  actual  valuation  to  all  the  propositions.  The  ability 
to  specify  constraints  enables  us  to  construct  Biichi  automata  with  fewer  number  of 
states.  The  notion  of  paths  and  languages  for  Biichi  automata  are  quite  natural  and 
we  define  them  formally  next. 

Definition  26  (Path  and  Language)  Let  B  be  a  BA.  A  path  of  B  is  an  infinite 
sequence  (so,si,...)  such  that:  (i)  so  G  InitB  and  (ii)  \/i  >  07  s* — >BSi+ 1-  For  a 
path  it  of  B  we  denote  by  Inf(n)  C  Sb  the  set  of  states  of  B  which  occur  infinitely 
often  in  1 r.  Then  the  language  of  B,  denoted  by  £(B),  is  defined  as:  jC(B)  =  {n  \  n 
is  a  path  of  B  A  Inf  (it)  n  Accb  0}. 

8.3.3  Product  Automaton 

Let  M  be  a  Kripke  structure  and  B  be  a  Biichi  automaton  such  that  AP m  =  APB- 
We  define  the  standard  product  M  x  B  as  a  Biichi  automata  such  that: 

•  Smxb  =  {( s,b )  G  Sm  x  Sb  \  LM(s)  G  LB(b)} 

•  InitMxB  =  {(s,  b)  G  Smxb  \  s  G  InitM  A  be  Inits} 

•  APmxb  =  APb  and  V(s,  6)  G  Smxb  •  -^mxb(s,  6)  =  Ls(b) 

•  V(s,6)  G  SMxb  ■  V(s',b')  G  SMxb  ■  (s,b) — *mxb(s',V)  iff  s — >Ms'  and  b — >Bb' 

•  AccmxB  =  {( S,b )  G  Smxb  I  b  G  AccB} 

The  non-symmetrical  standard  product  M  x  B  accepts  exactly  those  paths  of  M 
which  are  consistent  with  B.  Its  main  technical  use  lies  in  the  following  result  of 
Gerth  et.  al.  [61]: 
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Theorem  16  Given  a  Kripke  structure  M  and  LTL  formula  <f>,  there  is  a  Bilchi 
automaton  such  that  M  \—  0  -<=>-  C(M  x  B =  0. 

Theorem  16  is  the  core  result  that  enables  efficient  LTL  model  checking.  Given  a 
Kripke  structure  M  and  an  LTL  formula  <0  we  first  construct  the  Biichi  automaton 
We  then  check  if  C{M  x  B-up)  =  0  using  a  highly  optimized  double-depth-first- 
search  algorithm  [46,71].  Finally,  if  C{M  x  B^)  =  0,  we  conclude  that  M  \—  <f>. 
Otherwise  we  conclude  that  M  [A  0. 

An  efficient  tool  to  convert  LTL  formulas  into  optimized  Biichi  automata  is 
Somenzi  and  Bloem’s  wring  [108, 113].  We  now  turn  our  attention  back  to  labeled 
Kripke  structures.  Recall  that  SE-LTL  formulas  allow  events  in  E m  to  stand  for 

atomic  propositions.  Therefore,  given  an  SE-LTL  formula  0  over  AP m  and  E m,  we 

can  interpret  0  as  an  LTL  formula  over  AP m  U  E m',  let  us  denote  the  latter  formula 
by  0*\  0I>  is  therefore  syntactically  identical  to  0,  but  differs  from  0  in  its  semantic 
interpretation. 

8.3.4  State/Event  Product 

We  now  define  the  state/event  product  of  a  labeled  Kripke  structure  with  a  Biichi 
automaton.  Let  M  be  an  LKS,  and  B  be  a  Biichi  automaton  such  that  AP b  = 
AP M  U  E M.  The  state/event  product  M  0  B  is  a  Biichi  automaton  that  satisfies  the 
following  conditions: 

•  Sm®b  —  €  S  x  Sb  |  3o:  G  E m  ■  Lm(s)  U  {a}  e  L#(0)}. 

•  InitM®B  =  {(s,  b)  E  Sm®b  \  s  G  InitM  A  be  InitB } 

•  APm®b  =  AP B  and  V(s,  b )  e  Sm®b  ■  Lm®b  (■ s,b )  =  LB{b) 
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•  V(s,6)  G  Sm®b  ■  V(s',  b')  G  Sm®b  ■  (s,b) — >M®B{s',b')  iff 

3a  G  EM  ■  s  -^m  s'  A  b — A  ( LM(s )  U  {a})  G  LB(b) 

•  Accm®b  —  {(s,b)  G  Sm®b  |  b  G  Acc#} 

The  usefulness  of  a  state/event  product  is  captured  by  the  following  theorem. 
Note  that  the  state/event  product  does  not  require  an  enlargement  of  the  LKS  M, 
even  though  we  consider  below  just  such  an  enlargement  in  the  course  of  the  proof  of 
Theorem  17. 

Theorem  17  For  any  LKS  M  and  SE-LTL  formula  <f,  the  following  holds: 

M  |=  (f)  «  C(M®B^)  =  0 

Proof.  Observe  that  a  state  of  M  can  have  several  differently-labeled  transitions 
emanating  from  it.  However,  by  duplicating  states  (and  transitions)  as  necessary, 
we  can  transform  M  into  another  LKS  M'  having  the  following  two  properties:  (i) 
jC(M')  =  £(M),  and  (ii)  for  every  state  s  of  M\  the  transitions  emanating  from  s 
are  all  labeled  with  the  same  action.  As  a  result,  the  validity  of  an  SE-LTL  atomic 
event  proposition  a  in  a  given  state  of  M'  does  not  depend  on  the  particular  path 
to  be  taken  from  that  state,  and  can  therefore  be  recorded  as  a  propositional  state 
variable  of  the  state  itself.  Formally,  this  gives  rise  to  a  Kripke  structure  M"  over 
atomic  state  propositions  AP m  U  Em-  We  now  claim  that: 

C(M  ®  B^,)  =  0  <=*►  C(M"  x  0  =  0.  (8.1) 

To  see  this,  notice  first  that  there  is  a  bijection  between  /i  :  £(M)  — >  C(M"). 
Next,  observe  that  any  path  in  C{M  ®  B^)  can  be  decomposed  as  a  pair  (7 r, /3), 
where  7r  G  C(M)  and  j3  G  £(13^)]  likewise,  any  path  in  £(M"  x  B^)  can  be 
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decomposed  as  a  pair  (it"  ,  (3),  where  n"  G  C(M ")  and  (3  G  C(B^).  A  straightforward 
inspection  of  the  relevant  definitions  then  reveals  that  (it,  [3)  G  C(M  ® B iff 
(h(it),[3)  G  jC(M"  x  which  establishes  onr  claim. 

Finally,  we  clearly  have  M  \=  (j)  iff  M'  |=  (ft  iff  M"  |=  (jf.  Combining  this  with 
Theorem  16  and  Equation  8.1  above,  we  get  M  |=  <f)  •<=>■  C(M  ®  B±\,)  =  0,  as 
required. 


□ 

The  significance  of  Theorem  17  is  that  it  enables  us  to  make  use  of  the  highly 
optimized  algorithms  [46,  71]  and  tools  [108]  available  for  verifying  LTL  formulas  on 
Kripke  structures  to  verify  SE-LTL  specifications  on  labeled  Kripke  structures,  at  no 
additional  cost. 

8.3.5  SE-LTL  Counterexamples 

In  case  an  LKS  M  does  not  satisfy  an  SE-LTL  formula  (j)  we  will  need  to  construct  a 
counterexample  so  as  to  perform  abstraction  refinement.  In  this  section,  we  present 
the  notion  of  a  counterexamples  to  an  SE-LTL  formula  formally.  We  begin  with  a 
few  well-known  results. 

Theorem  18  Let  Mi  and  M2  be  two  LKSs  and  (ft  be  an  SE-LTL  formula.  Then  the 
following  statement  holds: 

M\  M2  A  M2  |=  (ft  ==?■  Mi  |=  <ft 

Proof.  Follows  directly  from  the  following  two  facts: 

Mi  4  M2  ==►  jC(Mi)  C  C(M2) 
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□ 


M2  1=  4>  =>•  V7T  G  £(M2)  .  7T  |=  (j) 


Theorem  19  Let  Mj.  and  M2  be  two  LKSs  and  (j)  be  an  SE-LTL  formula.  Then  the 
following  statement  holds: 

Mi  M2  A  Mi  \f=  (f>  ==r-  M2  \f=  (f) 


Proof.  Directly  from  Theorem  18. 


□ 

Theorem  19  leads  to  the  notion  of  a  witness  for  the  non-ent ailment  of  a  SE-LTL 
formula  <f  by  an  LKS  M2.  It  essentially  says  that  an  LKS  Mi  is  a  witness  for  M2  ^ 
iff  Mi  =4  M2  and  M,  \f=  (j).  Alternately  such  a  witness  Mi  can  be  viewed  as  a 
counterexample  to  M2  |=  (j).  In  the  rest  of  this  chapter  we  will  write  Lasso  to  mean  a 
counterexample  for  SE-LTL  since  such  counterexamples  are  shaped  like  a  lasso.  We 
now  present  an  algorithm  for  constructing  Lassos. 

Let  M  be  an  LKS  and  <f>  be  an  SE-LTL  formula  such  that  M  ^  (|».  From 
Theorem  17  we  know  that  C(M  ®  B^)  ^  0.  Let  n®  =  ((so,  bo),  (si,  &i), . . . )  be 
an  arbitrary  element  of  £(M  <8>  B^).  Clearly  there  exists  a  sequence  of  actions 
(a0,ai,...)  such  that  the  following  two  conditions  hold:  (i)  (s0,  oto,  Si,  aq, . . . )  G 
£(M),  and  (ii)  \/i  >  0  .  LM(sf)  U  {«*}  G  LB  (6j).  Let  us  denote  the  sequence 
(so,  ce0,  Si,  «i, . . . )  by  7T.  Since  M  has  a  finite  number  of  states  and  a  finite  alphabet 
(recall  that  in  general  M  will  be  obtained  by  predicate  abstraction  of  a  program),  n 
induces  an  LKS  CE  such  that: 

•  Sce  =  {-Sj  |  i  >  0}  InitcE  =  {so}  AP ce  =  APm  Eqe  =  Em 
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•  Vs  G  Sqe  ■  LCe(s)  =  Lm(s )  Toe  =  {(sj,  on,  Sj+i)  j  i  >  0} 

The  significance  of  CE  is  captured  by  the  following  result  which  essentially  states 
that  CE  is  a  Lasso  for  M  «f>. 

Theorem  20  Let  M  be  an  LKS  and  f  be  an  SE-LTL  formula.  Suppose  that  M  \f=-  cf> 
and  let  CE  be  the  LKS  as  described  above.  Then  the  following  holds: 

CE  4  M  A  CE  ¥=  (j, 

Proof.  To  prove  that  CE  =4  M  we  show  that  the  relation  1Z  =  {(s,  s)  |  s  G  Sce } 
satisfies  the  following  two  conditions:  (i)  1Z  is  a  simulation  relation,  and  (ii)  Vsi  G 
InitcE  ■  3s2  G  InitM  ■  (si,  s2)  G  72- 

Recall  that  7T(gi  =  ((so,  bo),  (si,bi), . . .)  was  the  arbitrary  element  of  C(M  (E>  B^) 
used  to  construct  CE.  To  prove  that  CE  ^  ^  we  show  that  7 r®  G  C(CE  <g)  B^) 
and  hence  C(CE  ®  B_,^)  ^  0. 


□ 

Let  us  denote  by  ModelCheck  an  algorithm  which  takes  as  input  an  LKS  M 
and  an  SE-LTL  formula  <fi.  The  algorithm  checks  for  the  emptiness  of  C{M  ®  B^). 
If  C(M  (g>  B^)  is  empty,  it  returns  “M  |=  .  Otherwise  it  computes  (as  described 

above)  and  returns  a  Lasso  CE  for  M  ]/=  0. 

Theorem  21  Algorithm  ModelCheck  is  correct. 

Proof.  Follows  from  Theorem  17  and  Theorem  20. 


□ 
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8.4  A  Surge  Protector 


We  describe  a  safety-critical  current  surge  protector  in  order  to  illustrate  the 
advantages  of  state/event-based  implementations  and  specifications  over  both  the 
pure  state-based  and  the  pure  event-based  approaches.  The  surge  protector  is  meant 
at  all  times  to  disallow  changes  in  current  beyond  a  varying  threshold.  The  labeled 
Kripke  structure  in  Figure  8.1  captures  the  main  functional  aspects  of  such  a  protector 
in  which  the  possible  values  of  the  current  and  threshold  are  0,  1,  and  2.  The  threshold 
value  is  stored  in  the  variable  m  and  the  value  of  the  current  is  stored  in  variable 
c.  Changes  in  threshold  and  current  values  are  respectively  communicated  via  the 
events  mO,  ml,  m2,  and  cO,  cl,  c2. 

Note,  for  instance,  that  when  rn  —  1  the  protector  accepts  changes  in  current 
to  values  0  and  1,  but  not  2  (in  practice,  an  attempt  to  hike  the  current  up  to 
2  should  trigger,  say,  a  fuse  and  a  jump  to  an  emergency  state,  behaviors  which 
are  here  abstracted  away).  The  reader  may  object  that  we  have  only  allowed  for 
Boolean  variables  in  our  definition  of  labeled  Kripke  structures;  it  is  however  trivial 
to  implement  more  complex  types,  such  as  bounded  integers,  as  boolean  encodings, 
and  we  have  therefore  elided  such  details  here. 


m2 


mO 


Figure  8.1:  The  LKS  of  a  surge  protector 
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The  required  specification  is  neatly  captured  as  the  following  SE-LTL  formula: 

4>se  =  G((c2  =>•  m  —  2)  A  (cl  =>•  (m=lVm  =  2))). 

By  way  of  comparison,  Figure  8.2  represents  the  (event-free)  Kripke  structure 
that  captures  the  same  behavior  as  the  LKS  of  Figure  8.1.  In  this  pure  state- 
based  formalism,  nine  states  are  required  to  capture  all  the  reachable  combinations 
of  threshold  (m  =  i)  and  last  current  changes  (c  =  j )  values.  Note  that  the  surge 
protector  does  not  guarantee  c  <  m.  Indeed  states  where  c  >  m  (e.g.,  rn  —  1  and 
c  =  2)  are  reachable  since  the  value  of  m  can  be  decreased  while  keeping  the  value  of 
c  unchanged. 


Figure  8.2:  The  Kripke  structure  of  a  surge  protector 

The  data  (9  states  and  39  transitions)  compares  unfavorably  with  that  of  the 
LKS  in  Figure  8.1  (3  states  and  9  transitions).  Moreover,  as  the  allowable  current 
ranges  increase,  the  number  of  states  of  the  LKS  will  grow  linearly,  as  opposed  to 
quadratically  for  the  Kripke  structure.  The  number  of  transitions  of  both  will  grow 
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quadratically,  but  with  a  roughly  four-fold  larger  factor  for  the  Kripke  structure. 
These  observations  highlight  the  advantages  of  a  state/event  approach,  which  of 
course  will  be  more  or  less  pronounced  depending  on  the  type  of  system  under 
consideration. 

Another  advantage  of  the  state/event  approach  is  witnessed  when  one  tries  to 
write  down  specifications.  In  this  instance,  the  specification  we  require  is 

(f) s  =  G(((c  =  0  V  c  =  2)  A  X(c  =  1))  =>•  (m  —  1  V  m  —  2))A 
G(((c  =  0  V  c  =  1)  A  X(c  =  2))  ==>-  m  =  2), 

which  is  arguably  significantly  more  complex  than  0se.  The  pure  event-based 
specification  0e  capturing  the  same  requirement  is  also  clearly  more  complex  than 

0se- 

4>e  =  G(mO  =>•  ((_|cl)  W  (ml  V  m2)))A 
G(mO  =>•  ((->c2)  W  m2)) A 
G(ml  =>•  ((->c2)  W  m2)). 

The  greater  simplicity  of  the  implementation  and  specification  associated  with 
the  state/event  formalism  is  not  purely  a  matter  of  aesthetics,  or  even  a  safeguard 
against  subtle  mistakes;  experiments  also  suggest  that  the  state/event  formulation 
yields  significant  gains  in  both  time  and  memory  during  verification.  We  implemented 
three  parameterized  instances  of  the  surge  protector  as  simple  C  programs,  in  one 
case  allowing  message  passing  (representing  the  LKS),  and  in  the  other  relying  solely 
on  local  variables  (representing  the  Kripke  structure).  We  also  wrote  corresponding 
specifications  respectively  as  SE-LTL  and  LTL  formulas  (as  above)  and  converted 
these  into  Biichi  automata  using  the  tool  wring  [113].  Table  8.1  records  the  number 
of  Biichi  states  and  transitions  associated  with  the  specification,  as  well  as  the  time 
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Range 

Pure  State 

Pure  Event 

State/ Event  | 

St 

Tr 

B-T 

T-T 

St 

Tr 

B-T 

T-T 

St 

Tr 

B-T 

T-T 

2 

4 

5 

253 

383 

6 

10 

245 

320 

3 

4 

184 

252 

3 

8 

12 

270 

545 

12 

23 

560 

674 

4 

6 

298 

407 

4 

14 

23 

492 

1141 

20 

41 

1597 

1770 

5 

8 

243 

391 

5 

22 

38 

1056 

2326 

30 

64 

3795 

4104 

6 

10 

306 

497 

6 

32 

57 

2428 

4818 

42 

92 

12077 

12660 

7 

12 

614 

962 

7 

44 

80 

6249 

10358 

56 

125 

54208 

55064 

8 

14 

930 

1321 

8 

58 

107 

17503 

24603 

72 

163 

372784 

374166 

9 

16 

2622 

3133 

9 

74 

138 

55950 

67553 

* 

* 

* 

* 

10 

18 

8750 

9488 

10 

92 

173 

195718 

213969 

* 

* 

* 

* 

11 

20 

33556 

34503 

11 

* 

* 

* 

* 

* 

* 

* 

* 

12 

22 

135252 

136500 

12 

* 

* 

* 

* 

* 

* 

* 

* 

13 

24 

534914 

536451 

13 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

Table  8.1:  Comparison  of  pure  state-based,  pure  event-based  and  state/event-based  formalisms. 
Values  of  c  and  m  range  between  0  and  Range.  St  and  TV  respectively  denote  the  number 
of  states  and  transitions  of  the  Biichi  automaton  corresponding  to  the  specification.  B-T  is 
the  Biichi  construction  time  and  T-T  is  the  total  verification  time.  All  times  are  reported  in 
milliseconds.  A  *  indicates  that  the  Biichi  automaton  construction  did  not  terminate  in  10 
minutes. 

taken  by  MAGIC  to  construct  the  Biichi  automaton  and  confirm  that  the  corresponding 
implementation  indeed  meets  the  specification. 

A  careful  inspection  of  the  table  in  Table  8.1  reveals  several  consistent  trends. 
First,  the  number  of  Biichi  states  increases  quadratically  with  the  value  of  Range  for 
both  the  pure  state-based  and  pure  event-based  formalisms.  In  contrast,  the  increase 
is  only  linear  when  both  states  and  events  are  used.  We  notice  a  similar  pattern 
among  the  number  of  transitions  in  the  Biichi  automata.  The  rapid  increase  in  the 
sizes  of  Biichi  automata  will  naturally  contribute  to  increased  model  checking  time. 
However,  we  notice  that  the  major  portion  of  the  total  verification  time  is  required 
to  construct  the  Biichi  automaton.  While  this  time  increases  rapidly  in  all  three 
formalisms,  the  growth  is  observed  to  be  most  benign  for  the  state/event  scenario. 
The  net  result  is  clearly  evident  from  Table  8.1.  Using  both  states  and  events  allows 
us  to  push  the  limits  of  c  and  m  beyond  what  is  possible  by  using  either  states  or 
events  alone. 
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8.5  SE-LTL  Verification  of  C  Programs 


In  this  section  we  present  a  compositional  CEGAR  framework  for  verifying  C 
programs  against  SE-LTL  specifications.  Our  framework  for  SE-LTL  will  have  the 
same  general  structure  as  that  for  simulation  presented  in  earlier  chapters.  The 
crucial  difference  from  simulation  arises  due  to  the  difference  in  the  structure  of  the 
counterexamples.  Recall  that  a  Counterexample  Witness  for  simulation  always  has  a 
tree-like  structure  and  hence  is  acyclic.  In  contrast  a  Lasso  always  has  an  infinite  path 
and  hence  must  also  necessarily  have  a  cycle.  Thus  we  have  to  modify  our  algorithms 
for  counterexample  validation  and  abstraction  refinement  to  take  into  account  the 
cyclic  structure  present  in  Lassos. 

8.5.1  Compositional  Lasso  Validation 

The  notion  of  projections  for  Lassos  is  exactly  the  same  as  that  for  Counterexample 
Witnesses  since  projections  are  defined  on  LKSs  and  are  unaffected  by  the  presence 
or  absence  of  cycles.  The  algorithm  for  validating  Lasso  projections  is  called 
WeakSimLasso  and  is  presented  in  Procedure  8.1.  WeakSimLasso  takes  as  input 
a  projected  Lasso  CE,  a  component  C ,  and  a  context  7  for  C.  It  returns  true  if 
CE  [C]  and  false  otherwise.  WeakSimLasso  manipulates  sets  of  states  of 
[C]7  using  the  symbolic  techniques  presented  in  Section  3.4.  In  particular  it  uses  the 
functions  Preimage  and  Restrict  to  compute  pre-images  and  restrict  sets  of  states 
with  respect  to  propositions. 

WeakSimLasso  iteratively  computes  for  each  state  s  of  CE,  the  set  of  states 
Sim(s)  of  [C]  which  can  weakly  simulate  s.  It  returns  true  if  Sim(InitcE )  H 
Init[cj  7^  0)  otherwise  it  returns  false.  Note  that  the  process  of  computing  Sim 
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Procedure  8.1  WeakSimLasso  returns  true  iff  CE^lCjr 

Algorithm  WeakSimLasso(C'-E,  C,  7) 

-  CE  :  is  a  Lasso,  C  :  is  a  component,  7  :  is  a  context  for  C 
let  CE  =  (Si,  I  nit  1 ,  AP1,  L1,E1,  71,); 
let  [C]7  =  (S2,Init2,AP2,L2,E2,  T2); 

j /[C]  is  the  concrete  semantics  of  C  with  respect  to  7 
for  each  s  G  Si,  Sim(s)  :=  Restrict(S2,  L1(s)); 

/ /S'im  is  a  map  from  Si  to  2S2 

// intuitively,  Sim(s)  contains  the  states  of  [C]  which  can  weakly  simulate  s 
/ /initially,  Sim(s)  =  subset  of  S2  with  same  propositional  labeling  as  s 

forever  do 

if  Sim(Initi)  fl  Init2  =  0  then  return  FALSE; 

OldSim  :=  Sim ; 

for  each  s  s'  G  / /s'  is  a  successor  state  of  s 
if  (a  =  r)  then  Sim(s)  :=  Sim(s)  fl  Sim  (s' ); 
else  Sim(s)  :=  Sim(s)  fl  PreImage(Sim(s'),a ); 
if  S'im  =  OldSim,  then  return  true; 


essentially  involves  the  computation  of  a  greatest  fixed  point  and  might  not  terminate 
in  general.  Hence  WeakSimLasso  is  really  a  semi-algorithm.  But  this  situation  can 
hardly  be  improved  since  checking  if  CE  A  [£]  is  an  undecidable  problem  in  general. 

Theorem  22  Algorithm  WeakSimLasso  is  correct. 

Proof.  From  the  definition  of  weak  simulation,  the  correctness  of  Restrict  and 
Preimage ,  and  the  fact  that  r  ^  Ec7. 

□ 


8.5.2  Abstraction  Refinement 

Algorithm  AbsRefSELTL.  presented  in  Procedure  8.2  refines  an  abstraction  on  the 
basis  of  a  spurious  Lasso  projection.  It  is  very  similar  to  algorithm  AbsRefMin 
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(Procedure  7.1)  and  we  will  not  explain  it  further  here.  In  order  to  check  if  a  set  of 
branches  B  can  eliminate  a  spurious  Lasso  CE,  it  first  creates  a  predicate  abstraction 
M  using  B  as  the  set  of  seed  branches.  It  then  invokes  procedure  AbsSimLasso  to 
check  if  M  can  weakly  simulate  CE.  As  we  already  know,  B  can  eliminate  CE  iff  M 
cannot  weakly  simulate  CE. 

Note  that,  as  in  the  case  for  simulation  conformance,  our  algorithm  for 
constructing  predicate  mappings  can  only  derive  predicates  from  branch  conditions. 
Therefore,  in  principle,  we  might  be  unable  to  eliminate  a  spurious  Lasso.  In  the 
context  of  algorithm  AbsRefSELTL.  this  means  that  we  could  end  up  trying  all 
sets  of  branches  without  finding  an  appropriate  refined  abstraction  M.  In  such  a  case 
we  return  ERROR. 

Procedure  8.2  AbsRefSELTL  returns  a  refined  abstraction  ford  that  eliminates  a  spurious 
Lasso  projection  CE  and  ERROR  on  failure.  The  parameter  0  initially  expresses  constraints 
about  branches  which  can  eliminate  all  previous  spurious  Lasso  projections.  AbsRefSELTL 
also  updates  0  with  the  constraints  for  the  new  spurious  Lasso  projection  CE. 

Algorithm  AbsRefSELTL(  CE ,  C,  0, 7) 

-  CE  :  is  a  spurious  Lasso,  0  :  is  a  Boolean  formula 

-  C  :  is  a  component,  7  :  is  a  context  for  C 
<f>CE  ■■=  FALSE; 

for  each  B  C  Be  1 1  Be  is  the  set  of  branches  in  C 

II  :=  PredInfer(C  ,  7,  B)\  //If  is  set  of  predicates  inf 'erred  from  B 
M  :=  |C|^;  //M  is  the  predicate  abstraction  of  C  using  II 

if  -iAbsSimLasso(C'E,  M )  then  <f>cE  '■=  4>ce  V  f\b.eB  vp 
/ /M  cannot  weakly  simulate  CE,  hence  B  can  eliminate  CE 
(f)  :=  f>  A  (fcE]  II update  0  with  the  constraints  for  CE 
invoke  PBS  to  solve  (0,  Ejhpry); 

if  0  is  unsatisfiable  then  return  ERROR;  / /no  set  of  branches  can  eliminate  CE 
else  let  s  =  solution  returned  by  PBS; 

let  {17, . . . ,  vm}  =  variables  assigned  true  by  sopt  and  B  =  {61; . . . ,  bm }; 

II  :=  PredInfer(C,  7 ,B)]  I/ If  is  the  set  of  predicates  inferred  from  B 
return  |CJ^;  / /return  the  predicate  abstraction  of  C  using  II 
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Procedure  8.3  AbsSimLasso  returns  true  iff  M  can  weakly  simulate  CE. 
Algorithm  AbsSimLasso(C'£',  M) 

-  CE  :  is  a  Lasso,  M  :  is  an  LKS  obtained  by  predicate  abstraction 
let  CE  =  ( S ,  I  nit,  AP,  L,  E,  T); 

let  M  =  (s,  Imt,  AP,  L,  E,  t)  ; 

for  each  s  E  S,  Sim(s)  :=  {s  G  S  \  L(s )  =  L(s)}; 

/ / Sim  is  a  map  from  S  to  2s 

II intuitively,  Sim(s)  contains  the  states  of  M  which  can  weakly  simulate  s 
//initially,  Sim(s)  =  subset  of  S  with  same  propositional  labeling  as  s 

forever  do 

if  Sim(Init)  fl  Init  =  0  then  return  false; 

OldSim  :=  Sim ;  f  / save  old  map  for  subsequent  fixed  point  detection 
for  each  s  s'  G  T  / /s'  is  a  successor  state  of  s 
if  (i a  =  t)  then  Sim(s)  :=  Sim(s)  fl  Sim(s'); 
else  Sim(s)  :=  {s  G  Sim(s)  \  Succ(s,a )  fl  Sim  (s' )  ^  0}; 
if  Sim  =  OldSim  then  return  true; 

/ /fixed  point  reached,  hence  M  weakly  simulates  CE 


Theorem  23  Algorithm  AbsRefSELTL  is  correct. 

Proof.  It  is  obvious  that  AbsRefSELTL  either  returns  error  or  a  refined 
abstraction  M  such  that  for  all  spurious  Lasso  projections  CE  seen  so  far,  CE  ^  M. 

□ 


8.6  CEGAR  for  SE-LTL 

The  complete  CEGAR  algorithm  in  the  context  of  SE-LTL  conformance,  called 
SELTL-CEGAR,  is  presented  in  Procedure  8.4.  It  invokes  at  various 
stages  algorithms  Predlnfer,  the  predicate  abstraction  algorithm,  ModelCheck, 
WeakSimLasso  and  AbsRefSELTL.  It  takes  as  input  a  program  V ,  an  SE-LTL 
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specification  LKS  0  and  a  context  T  for  P  and  outputs  either  “P  |=  0”  or  “P  0: 
or  ERROR.  Intuitively  SELTL-CEGAR  works  as  follows. 


Procedure  8.4  SELTL-CEGAR  checks  entailment  between  a  program  P  and  an  SE-LTL 
specification  0  in  a  context  I\ 

Algorithm  SELTL-CEGAR (P,0,T) 

-  P  :  is  a  program,  T  :  is  a  context  for  P,  0  :  is  an  SE-LTL  formula 
let  V  =  (Ci, . . . ,  Cn)  and  Y  =  (71, . . . ,  yn); 
for  each  i  G  {l,...,n} 

Ilj  :=  PredInfer(Cj,7i,0)  and  Mt  :=  [[CJ^1  and  0;  =  TRUE; 

/ /Adi  =  initial  predicate  abstractions  of  Ci  with  empty  set  of  seed  branches 

forever  do 

let  M  =  Mi  ||  •  •  •  ||  Mn ; 

/ /A/I  is  the  composition  of  predicate  abstractions 
if  (ModelCheck(M,  0)  =  “M  |=  0”)  return  “P  \=  0”; 

/ /if  AI  satisfies  0  then  so  does  V 
let  CE  =  Lasso  returned  by  ModelCheck; 
find  i  e  {1, . . .  ,n}  such  that  -iWeakSimLasso(C'E  J  7i,C*,7j); 

// check  compositionally  if  CE  is  spurious 
if  (no  such  i  found)  return  UV  ^0”;  // CE  A  valid  and  hence  V  ^  0 
if  (AbsRefSELTL(CE  J  7j,Cj,0j,7j)  =  error)  return  ERROR; 

//no  set  of  branches  can  eliminate  CE  J  7* 

Adi  '■=  AbsRefSELTL(CE  J  7*,  C*,  0j,  7*);  / /refine  the  abstraction  and  repeat 


Let  P  =  (Ci, . . .  ,Cn).  Then  SELTL-CEGAR  maintains  a  set  of  abstractions 
Mi, . . . ,  Adn  where  Adi  is  a  predicate  abstraction  of  C*  for  i  £  (1, . . . ,  n}.  Note  that 
by  Theorem  7,  Ad  =  Mi  ||  •  •  •  ||  Mn  is  an  abstraction  of  P.  Initially  each  Adi  is  set 
to  the  predicate  abstraction  of  C*  corresponding  to  an  empty  set  of  seed  branches. 
Also  for  each  C, ,  SELTL-CEGAR  maintains  a  boolean  formula  d)r  (initialized  to 
true)  used  for  predicate  minimization.  Next  SELTL-CEGAR  iteratively  performs 
the  following  steps: 

1.  (Verify)  Invoke  algorithm  ModelCheck  to  check  if  M  satisfies  0.  If 
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ModelCheck  returns  “M  |=  </>”  then  output  UV  |=  0”  and  exit.  Otherwise 
let  CE  be  the  Lasso  returned  by  ModelCheck.  Go  to  step  2. 

2.  (Validate)  For  i  e  (1,  invoke  WeakSimLasso  ( (7 E  J  lii  Cii  li)-  If 

every  invocation  of  WeakSimLasso  returns  TRUE  then  output  "V  \f=  0” 
and  exit.  Otherwise  let  i  be  the  minimal  element  of  (1,  such  that 

WeakSimLasso  ( C  E  J  7*,  0, 7^)  returns  false.  Go  to  step  3. 

3.  (Refine)  Invoke  AbsRefSELTL  (CE  J  7,,  C,  </>*,  7^).  If  AbsRefSELTL 
returns  error,  output  error  and  stop.  Otherwise  set  Mj  to  the  abstraction 
returned  by  AbsRefSELTL.  Repeat  from  step  1. 

Theorem  24  Algorithm  SELTL-CEGAR  is  correct. 

Proof.  When  SELTL-CEGAR  returns  aV  \—  <fn  its  correctness  follows  from 
Theorem  7,  Theorem  21  and  Theorem  1.  When  SELTL-CEGAR  returns  “V  \f=  0” 
its  correctness  follows  from  Theorem  12,  Theorem  22  and  Theorem  23. 


□ 


8.7  Experimental  Results 

We  experimented  with  two  broad  sets  of  benchmarks.  All  our  experiments  were 
performed  on  an  AMD  Athlon  XP  1600+  machine  with  900  MB  RAM  running  RedHat 
Linux  7.1.  The  first  set  of  our  examples  were  based  on  OpenSSL.  This  is  a  popular 
protocol  used  for  secure  exchange  of  sensitive  information  over  untrusted  networks. 
The  target  of  our  verification  process  was  the  implementation  of  the  initial  handshake 
required  for  the  establishment  of  a  secure  channel  between  a  client  and  a  server. 
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Name 

St(B) 

Tr(B) 

St(Mdl) 

T(BA) 

T(Mdl) 

T(Ver) 

T(Total) 

Mem 

srvr-l-ss 

4 

5 

5951 

213 

32195 

1654 

34090 

- 

srvr-l-se 

3 

4 

4269 

209 

18116 

1349 

19674 

- 

srvr-2-ss 

11 

23 

4941 

292 

31331 

2479 

34102 

- 

srvr-2-se 

3 

4 

4269 

196 

17897 

1317 

19410 

- 

srvr-3-ss 

37 

149 

5065 

1147 

26958 

4031 

32137 

- 

srvr-3-se 

3 

4 

4269 

462 

17950 

1908 

20319 

- 

srvr-4-ss 

16 

41 

5446 

806 

29809 

7382 

39341 

28.6 

srvr-4-se 

7 

14 

4333 

415 

21453 

3513 

25906 

24.1 

srvr-5-ss 

25 

47 

7951 

690 

48810 

6842 

56888 

39.3 

srvr-5-se 

20 

45 

4331 

497 

18808 

2925 

22765 

24.2 

clnt-1-ss 

16 

41 

4867 

793 

24488 

1235 

26953 

25.8 

clnt-1-se 

7 

14 

3693 

376 

17250 

583 

18683 

22.1 

clnt-2-ss 

25 

47 

7574 

699 

43592 

1649 

46444 

38.1 

clnt-2-se 

18 

40 

3691 

407 

15304 

1087 

17269 

21.2 

ssl-l-ss 

25 

47 

24799528 

874 

65585 

* 

* 

850.5 

ssl-l-se 

20 

45 

13558984 

655 

33091 

2172139 

2206983 

162.4 

ssl-2-ss 

25 

47 

32597042 

836 

66029 

* 

* 

346.6 

ssl-2-se 

18 

40 

15911791 

713 

34641 

4148550 

4185068 

320.7 

UCOS-BUG 

8 

14 

873 

205 

3409 

261 

3880 

- 

UCOS-1 

8 

14 

873 

194 

3365 

2797 

6357 

- 

UCOS-2 

5 

8 

873 

123 

3372 

2630 

6127 

- 

Table  8.2:  Experimental  results  with  OpenSSL  and  ^C/OS-II.  St(B)  and  Tr(B)  = 
respectively  the  number  of  states  and  transitions  in  the  Buchi  automaton;  St(Mdl)  =  number 
of  states  in  the  model;  T(Mdl)  =  model  construction  time;  T(BA)  =  Buchi  construction 
time;  T(Ver)  =  model  checking  time;  T(Total)  =  total  verification  time.  All  reported  times 
are  in  milliseconds.  Mem  is  the  total  memory  requirement  in  MB.  A  *  indicates  that  the  model 
checking  did  not  terminate  within  2  hours  and  was  aborted.  In  such  cases,  other  measurements 
were  made  at  the  point  of  forced  termination.  A  -  indicates  that  the  corresponding  measurement 
was  not  taken. 

From  the  official  SSL  specification  [109]  we  derived  a  set  of  nine  properties  that 
every  correct  SSL  implementation  should  satisfy.  The  first  five  properties  are  relevant 
only  to  the  server,  the  next  two  apply  only  to  the  client,  and  the  last  two  properties 
refer  to  both  a  server  and  a  client  executing  concurrently.  For  instance,  the  first 
property  states  that  whenever  the  server  asks  the  client  to  terminate  the  handshake, 
it  eventually  either  gets  a  correct  response  from  the  client  or  exits  with  an  error  code. 
The  second  property  expresses  the  fact  that  whenever  the  server  receives  a  handshake 
request  from  a  client,  it  eventually  acknowledges  the  request  or  returns  with  an  error 
code.  The  third  property  states  that  a  server  never  exchanges  encryption  keys  with 
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a  client  once  the  cipher  scheme  has  been  changed. 

Each  of  these  properties  were  then  expressed  in  SE-LTL,  once  using  only  states 
and  again  using  both  states  and  events.  Table  8.2  summarizes  the  results  of  our 
experiments  with  these  benchmarks.  The  SSL  benchmarks  have  names  of  the  form 
x-y-z  where  x  denotes  the  type  of  the  property  and  can  be  either  srvr,  clnt  or  ssl, 
depending  on  whether  the  property  refers  respectively  to  only  the  server,  only  the 
client,  or  both  server  and  client,  y  denotes  the  property  number  while  z  denotes  the 
specification  style  and  can  be  either  ss  (only  states)  or  se  (both  states  and  events). 
We  note  that  in  each  case  the  numbers  for  state/event  properties  are  considerably 
better  than  those  for  the  corresponding  pure-state  properties. 

The  second  set  of  onr  benchmarks  were  obtained  from  the  source  code  of  /xC/OS-ll 
version  2.70  (which  we  will  refer  to  simply  as  /iC/OS-II  in  the  rest  of  this  thesis). 
This  is  a  popular,  lightweight,  real-time,  multi-tasking  operating  system  written  in 
about  6000  lines  of  ANSI  C.  /xC/OS-II  uses  a  lock  to  ensure  mutual  exclusion  for 
critical  section  code.  Using  SE-LTL  we  expressed  two  properties  of  ^C/OS-11:  (i)  the 
lock  is  acquired  and  released  alternately  starting  with  an  acquire  and  (ii)  every  time 
the  lock  is  acquired  it  is  eventually  released.  These  properties  were  expressed  using 
only  events. 

We  found  four  bugs  in  //C/OS-II  that  causes  it  to  violate  the  first  property.  One 
of  these  bugs  was  unknown  to  the  developers  while  the  other  three  had  been  found 
previously.  The  second  property  was  found  to  be  valid.  In  Table  8.2  these  experiments 
are  named  UCOS-BUG  and  UCOS-2  respectively.  Next  we  fixed  the  bug  and 
verified  that  the  first  property  holds  for  the  corrected  yuC/OS-II.  This  experiment  is 
called  UCOS-1  in  Table  8.2. 
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Chapter  9 


Two-Level  Abstraction  Refinement 


In  this  chapter,  we  attempt  to  address  the  state-space  explosion  problem  in  the 
context  of  verifying  simulation  conformance  between  a  concurrent  (message-passing) 
C  program  and  an  LKS  specification.  More  specifically,  we  present  a  fully 
automated  compositional  framework  which  combines  two  orthogonal  abstraction 
techniques  (operating  respectively  on  data  and  events)  within  a  counterexample 
guided  abstraction  refinement  (CEGAR)  scheme.  In  this  way,  our  algorithm 
incrementally  increases  the  granularity  of  the  abstractions  until  the  specification  is 
either  established  or  refuted.  Our  explicit  use  of  compositionality  delays  the  onset 
of  state  space  explosion  for  as  long  as  possible.  To  our  knowledge,  this  is  the  first 
compositional  use  of  CEGAR  in  the  context  of  model  checking  concurrent  C  programs. 
We  describe  our  approach  in  detail,  and  report  on  some  very  encouraging  preliminary 
experimental  results  obtained  with  our  tool  MAGIC. 
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9.1  Introduction 


As  mentioned  before,  there  has  been  a  tremendous  amount  of  research  and 
advancement  over  the  years  devoted  to  the  abstract  modeling  and  validation  of 
concurrent  systems  and  their  specifications,  ffowever,  the  majority  of  these  advances 
target  specific — and  often  orthogonal — aspects  of  the  problem,  but  fail  to  solve  it  as 
a  whole.  The  work  we  present  here  attempts  to  provide  a  more  complete  approach  to 
efficiently  verify  global  specifications  on  concurrent  C  programs  in  a  fully  automated 
way.  More  specifically,  we  focus  on  reactive  systems,  implemented  using  concurrent  C 
programs  that  communicate  with  each  other  through  synchronous  (blocking)  message¬ 
passing.  Examples  of  such  systems  include  client-server  protocols,  web  services  [15], 
schedulers,  telecommunication  applications,  etc.  As  in  previous  chapters,  we  consider 
specifications  expressed  as  LKSs. 

We  propose  a  fully  automated  compositional  two-level  counterexample  guided 
abstraction  refinement  scheme  to  verify  that  a  concurrent  C  program  V  conforms  to 
an  LKS  specification  Sp  in  a  context  T.  Let  us  assume  that  our  program  V  consists  of 
a  set  of  components  (C i ,  . . .  ,Cn).  Naturally,  our  program  context  T  must  also  consist 
of  a  set  of  component  contexts  (71, . . .  ,7 n),  one  for  each  component  C;. 

We  first  transform  each  Ci  into  a  finite-state  predicate  abstraction  C*.  Since 
the  parallel  composition  of  these  predicate  abstractions  may  well  still  have  an 
unmanageably  large  state  space,  we  further  reduce  each  C*  by  conservatively 
aggregating  states  together,  based  on  the  actions  they  can  perform,  yielding  a  smaller 
action- guided  abstraction  A,;  only  then  do  we  explicitly  build  the  global  state  space 
of  the  much  coarser  parallel  composition  A  =  A\  ||  ...  ||  An. 

Recall  that  [P]r  denotes  the  concrete  semantics  of  our  program  V.  We  know  that 
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by  construction,  ["Pjp  A,  i.e.,  A  exhibits  all  of  P’s  behaviors,  and  usually  many 
more.  We  check  A  ^  Sp.  If  successful,  we  conclude  that  [P]r  Sp.  Otherwise, 
we  must  examine  the  Counterexample  Witness  obtained  to  determine  whether  it  is 
valid  or  not.  It  is  important  to  note  that  this  validation  can  be  carried  out  not  only 
component-wise  (as  in  previous  chapters),  but  also  level- wise.  Furthermore,  we  are 
able  to  avoid  constructing  in  full  the  large  state  space  of  V. 

A  valid  Counterexample  Witness  shows  V  ^  Sp  and  thus  terminates  the 
procedure.  Otherwise,  a  (component-specific)  refinement  of  the  appropriate 
abstraction  is  carried  out,  eliminating  the  spurious  Counterexample  Witness,  and 
the  algorithm  proceeds  with  a  new  iteration  of  the  verification  cycle.  The  crucial 
features  of  our  approach  therefore  consist  of  the  following: 

•  We  leverage  two  very  different  kinds  of  abstraction  to  reduce  a  concurrent  C 
program  to  a  very  coarse  parallel  composition  of  finite-state  processes.  The 
first  (predicate)  abstraction  partitions  the  (potentially  infinite)  state  space 
according  to  the  possible  values  of  variables,  whereas  the  second  (action-guided) 
abstraction  groups  these  resulting  states  together  according  to  the  actions  that 
they  can  perform. 

•  A  counterexample  guided  abstraction  refinement  scheme  incrementally  refines 
these  abstractions  until  the  right  granularity  is  achieved  to  decide  whether 
the  specification  holds  or  not.  We  note  that  while  termination  of  the  entire 
algorithm  obviously  cannot  be  guaranteed1,  all  of  our  experimental  examples 
could  be  handled  without  requiring  human  input. 

•  Our  use  of  compositional  reasoning,  grounded  in  standard  process  algebraic 
1This  of  course  follows  from  the  fact  that  the  halting  problem  is  undecidable. 
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techniques,  enables  us  to  perform  most  of  our  analysis  component  by 
component,  without  ever  having  to  construct  global  state  spaces  except  at  the 
most  abstract  level. 

The  verification  procedure  is  fully  automated,  and  requires  no  user  input  beyond 
supplying  the  C  programs  and  the  specification  to  be  verified.  We  have  implemented 
the  algorithm  within  our  tool  MAGIC  [24, 80]  and  have  carried  out  a  number  of  case 
studies,  which  we  report  here.  To  our  knowledge,  our  algorithm  is  the  first  to  invoke 
CEGAR  over  more  than  a  single  abstraction  refinement  scheme  (and  in  particular  over 
action-based  abstractions),  and  also  the  first  to  combine  CEGAR  with  fully  automatic 
compositional  reasoning  for  concurrent  systems. 

The  experiments  we  have  carried  out  range  over  a  variety  of  sequential  and 
concurrent  examples,  and  yield  promising  results.  The  two-level  approach  constructs 
models  that  are  often  almost  two  orders  of  magnitude  smaller  than  those  generated 
by  predicate  abstraction  alone.  This  also  translates  to  over  an  order  of  magnitude 
reduction  in  actual  memory  requirement.  Additionally,  the  two-level  approach  is 
faster,  especially  in  the  concurrent  benchmarks  where  it  often  reduces  verification 
time  by  a  factor  of  over  three.  Full  details  are  presented  in  Section  9.6. 


9.2  Related  Work 

Predicate  abstraction  was  introduced  in  [63]  as  a  means  to  transform  conservatively 
infinite-state  systems  into  finite-state  ones,  so  as  to  enable  the  use  of  Unitary 
techniques  such  as  model  checking  [32,39].  It  has  since  been  widely  used — see,  for 
instance  [5,42,44,48,50,89]. 

The  formalization  of  the  more  general  notion  of  abstraction  first  appeared  in  [47] . 
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We  distinguish  between  exact  abstractions,  which  preserve  all  properties  of  interest 
of  the  system,  and  conservative  abstractions — used  in  this  chapter — which  are  only 
guaranteed  to  preserve  safety  properties  of  the  system  (e.g.,  [36,  75]).  The  advantage 
of  the  latter  is  that  they  usually  lead  to  much  greater  reductions  in  the  state  space 
than  their  exact  counterparts.  However,  conservative  abstractions  in  general  require 
an  iterated  abstraction  refinement  mechanism  (such  as  CEGAR  [37])  in  order  to 
establish  specification  satisfaction. 

The  abstractions  we  use  on  finite-state  processes  essentially  group  together  states 
that  can  perform  the  same  set  of  actions,  and  gradually  refine  these  partitions 
according  to  reachable  successor  states.  Our  refinement  procedure  can  be  seen  as 
an  atomic  step  of  the  Paige-Tarjan  algorithm  [96],  and  therefore  yields  successive 
abstractions  which  converge  in  a  finite  number  of  steps  to  the  bisimulation  quotient 
of  the  original  process. 

CEGAR  has  been  used,  among  others,  in  non-automated  [90],  and  automated  [6, 
29,40,66,77,98]  forms.  Compositionality,  which  features  crucially  in  our  work, 
is  broadly  concerned  with  the  preservation  of  properties  under  substitution  of 
components  in  concurrent  systems.  It  has  been  most  extensively  studied  in  process 
algebra  (e.g.,  [69,85,100]),  particularly  in  conjunction  with  abstraction.  In  [10],  a 
compositional  framework  for  (non-automated)  CEGAR  over  data-based  abstractions 
is  presented.  This  approach  differs  from  ours  in  that  communication  takes  place 
through  shared  variables  (rather  than  blocking  message-passing),  and  abstractions  are 
refined  by  eliminating  spurious  transitions,  rather  than  by  splitting  abstract  states. 

A  technique  closely  related  to  compositionality  is  that  of  assume-guarantee 
reasoning  [64,67,84],  It  was  originally  developed  to  circumvent  the  difficulties 
associated  with  generating  exact  abstractions,  and  has  recently  been  implemented 
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as  part  of  a  fully  automated  and  incremental  verification  framework  [41]. 

Among  the  works  most  closely  resembling  ours  we  note  the  following.  The  Bandera 
project  [44]  offers  tool  support  for  the  automated  verification  of  Java  programs 
based  on  abstract  interpretation;  there  is  no  automated  CEGAR  and  no  explicit 
compositional  support  for  concurrency.  Pasareanu  et.  al.  [98]  import  Bandera-derived 
abstractions  into  an  extension  of  Java  PathFinder  which  incorporates  CEGAR. 
However,  once  again  no  use  is  made  of  compositionality,  and  only  a  single  level  of 
abstraction  is  considered.  Stoller  [111,112]  describes  another  tool  implemented  in 
Java  PathFinder  which  explicitly  supports  concurrency;  it  uses  data-type  abstraction 
on  the  first  level,  and  partial  order  reduction  with  aggregation  of  invisible  transitions 
on  the  second  level.  Since  all  abstractions  are  exact  it  does  not  require  the  use  of 
CEGAR.  The  slam  project  [5,  6, 107]  has  been  very  successful  in  analyzing  interfaces 
written  in  C.  It  is  built  around  a  single-level  predicate  abstraction  and  automated 
CEGAR  treatment,  and  offers  no  explicit  compositional  support  for  concurrency. 
Lastly,  the  blast  project  [13,65,66]  proposes  a  single-level  (i.e.,  only  predicate 
abstraction)  lazy  (on-the-fly)  CEGAR  scheme  and  thread-modular  assume-guarantee 
reasoning.  The  blast  framework  is  based  on  shared  variables  rather  than  message- 
passing  as  the  communication  mechanism. 

The  next  section  presents  a  series  of  standard  definitions  that  are  used  throughout 
the  rest  of  this  chapter.  Section  9.5  then  describes  the  two-level  CEGAR  algorithm. 
Finally,  Section  9.6  summarizes  the  results  of  our  experiments. 
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9.3  Abstraction 


Recall,  from  Definition  1,  than  an  LKS  is  a  6-tuple  (S',  I  nit,  AP ,  L,E,  T ).  In  this 
section  we  present  our  notion  of  abstraction.  Our  framework  employs  quotient  LKSs 
as  abstractions  of  concrete  LKSs.  Given  a  concrete  LKS  Ad,  one  can  obtain  a  quotient 
LKS  as  follows.  The  states  of  the  quotient  LKS  are  obtained  by  grouping  together 
states  of  Ad  such  that  all  states  in  a  particular  group  agree  on  their  propositional 
labeling.  Alternatively,  one  can  view  these  groups  as  equivalence  classes  of  some 
equivalence  relation  on  Sm-  Transitions  of  the  quotient  LKS  are  defined  existentially. 
We  now  present  a  formal  definition  of  these  concepts. 

Definition  27  (Propositional  Compatibility)  Let  Ad  =  (S',  I  nit,  AP,  L,E,  T)  be 
any  LKS.  An  equivalence  relation  R  C  S  x  S  is  said  to  be  propositionally  compatible 
iff  the  following  condition  holds: 

Vsi  G  S ■  Vs2  G  S •  (si, S2)  £  -R  L(si)  =  L(s2) 

Definition  28  (Quotient  LKS)  Let  Ad  =  (S',  I  nit,  AP ,  L,  E,  T)  6e  an  LKS  and 
R  C  S  x  S  be  a  propositionally  compatible  equivalence  relation.  For  an  arbitrary 
s  E  S  we  let  [s]^  denote  the  equivalence  class  of  s.  Ad  and  R  then  induce  a  quotient 
LKS  AdR  =  (SR,dnitR,APR1LR1J:R1TR)  where:  (%)  SR  =  {[s]fi  |  s  G  S},  (ii) 
dnitR  =  {[s]R  |  s  G  Init},  (in)  APR  =  AP ,  (iv)  V[s]R  G  SR  .  LR([s]R)  =  L(s)  (note 
that  this  is  well-defined  because  R  is  propositionally  compatible),  (v)  TjR  =  E,  and 

In  the  rest  of  this  chapter  we  will  only  restrict  ourselves  to  propositionally 
compatible  equivalence  relations.  We  write  [s]  to  mean  [s]R  when  R  is  clear  from 
the  context.  AdR  is  often  called  an  existential  abstraction  of  Ad.  The  states  of  Ad  are 
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referred  to  as  concrete  states  while  those  of  MR  are  called  abstract  states.  Quotient 
LKSs  have  been  studied  in  the  verification  literature.  In  particular,  the  following 
result  is  well-known  [39]. 

Theorem  25  Let  M  =  (S,  I  nit,  AP ,  L,£,  T)  be  an  LKS,  R  an  equivalence  relation 
on  S,  and  MR  the  quotient  LKS  induced  by  M  and  R.  Then  M  =4  MR. 

9.4  Counterexample  Validation  and  Refinement 

Recall  that  our  program  V  consists  of  a  set  of  components  (Ci, . . .  ,Cn)  and  our 
program  context  T  consists  of  a  set  of  component  contexts  (71, . . .  ,7n),  one  for  each 
Ci .  Our  goal  is  to  verify  whether  V  ^4  Sp. 

For  i  G  {1  let  us  denote  by  Mi  the  LKS  obtained  by  predicate 

abstraction  of  C*,  and  let  Ri  be  an  equivalence  relation  over  Smi ■  Suppose  CW  is 
a  Counterexample  Witness  for  (MRl  ||  •  •  •  ||  MRn )  ^  Sp.  We  would  now  like  to  verify 
in  a  component-wise  manner  whether  CW  is  a  valid  Counterexample  Witness.  Recall 
that  this  involves  checking  whether  CW  \  7*  ^  C*  for  i  G  {1, . . . ,  n}. 

However  we  want  to  perform  our  validation  check  also  in  a  level-wise  manner.  In 
other  words  we  first  check,  for  i  G  {l,...,n},  whether  CW  \  7  j  ^  Mj.  If  this  is 
not  the  case  for  some  i  G  {1, . . . ,  n},  we  refine  the  abstraction  MRi  and  repeat  the 
simulation  check.  Otherwise  we  proceed  with  checking  CW  \  7 i  ^  Ci  and  subsequent 
refinement  (if  required)  as  described  in  earlier  chapters. 

In  our  framework,  refinement  of  action-guided  abstractions  involves  computing 
proper  refinements  of  equivalence  relations  based  on  abstract  successors.  We  now 
present  this  refinement  scheme,  beginning  with  a  few  preliminary  definitions. 
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Definition  29  (Equivalence  Refinement)  Let  R\  and  R2  be  two  equivalence 
relations  over  some  set  S.  Then  R±  is  said  to  be  a  refinement  of  R2  iff  the  following 
condition  holds: 

Vs  G  S' .  [s]Rl  C  [s]R2 

R\  is  said  to  be  a  proper  refinement  of  R2  iff  the  following  condition  holds: 

(VseS.  [s]Rl  c  [S]R2)  A(3seS.  [s]Rl  c  [sf2) 

Definition  30  (Abstract  Successor)  Let  M  =  (S,  Init,  AP ,  L,Y,,  T)  be  an  LKS, 
and  R  C  S  x  S  be  an  equivalence  relation.  Let  MR  =  ( SR ,  InitR ,  APR,  LR,  TH), 
s  G  S  mid  a  E  E.  T/ien  the  function  Succ  :  S  x  E  — >  2s’fl  is  defined  as  follows: 

Succ(s,a )  =  {[s']R  G  |  s'  G  S'mcc(s,q;)} 

In  other  words,  [s']^  G  SR  is  an  abstract  successor  of  s  under  action  a  iff  M  has 
an  Q-labeled  transition  from  s  to  some  element  of  [s']R. 

9.4.1  Splitting  Equivalence  Classes 

Given  M,  R,  [s]R  G  SR  and  A  C  E,  we  denote  by  Split(M,  R,  [s]R,  A)  the  equivalence 
relation  obtained  from  R  by  sub-partitioning  the  equivalence  class  [s]B  according  to 
the  following  scheme:  Vsi,S2  G  [s]R,  Si  and  s2  belong  to  the  same  sub-partition  of 
[s]R  iff  Vo  G  A  .  Succ(si,  a )  =  Succ(s2,  a). 

Note  that  the  equivalence  classes  (abstract  states)  other  than  [s]^  are  left 
unchanged.  Recall  Definition  29  of  refinement  between  equivalence  relations.  It  is  easy 
to  see  that  Split(M}  R,  [s]R,  A)  is  a  refinement  of  R.  In  addition,  Split(M ,  R,  [s]R,  A) 
is  a  proper  refinement  of  R  iff  [s]^  is  split  into  more  than  one  piece,  i.e. ,  if  the  following 
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condition  holds: 


3a  G  A  .  3si  e  [s}R  .  3s2  e  [ s]R  .  3[s']R  e  SR. 

[s']R  G  Succ(si,a)  A  [s']^  ^  Succ(s2,a)  (9.1) 

The  correctness  of  the  above  claim  is  easy  to  see  since  if  Condition  9.1  holds,  then 
according  to  our  definition,  si  and  s2  must  belong  to  different  sub-partitions  of  [s]R 
after  the  application  of  Split. 


9.4.2  Checking  Validity  of  a  Counterexample  Witness 

Let  M  be  an  LKS  obtained  by  predicate  abstraction  from  a  component  and  R  be 
an  equivalence  relation  on  the  states  of  M.  Let  CW  be  a  Counterexample  Witness 
projection  such  that  CW  A  MR  where  MR  is  the  quotient  LKS  induced  by  M 
and  R.  Recall  that  we  are  interested  to  check  if  CW  ^  M.  This  is  achieved 
by  algorithm  WeakSimulAG  shown  in  Figure  9.1.  WeakSimulAG  first  invokes 
algorithm  CanSimulAG  (shown  in  Figure  9.2)  to  compute  the  set  S  of  states  of  M 
which  can  weakly  simulate  the  initial  state  Init  of  CW.  It  then  returns  TRUE  if  some 
initial  state  of  M  belongs  to  S  and  false  otherwise. 


Procedure  9.1  WeakSimulAG  returns  true  if  CW  A  M  and  false  otherwise. 
Algorithm  WeakSimulAG  (CW,  M ) 

-  CW  :  is  a  Counterexample  Witness  projection,  M  :  is  an  LKS 
let  Init  =  initial  state  of  CW] 

S  :=  CanSimulAG  (CW,  Init,M ); 
let  Init'  =  set  of  initial  states  of  M; 
return  (S  D  Init'  ^  0); 


174 


Procedure  9.2  CanSimulAG  returns  the  set  of  states  of  M  which  can  weakly  simulate  s. 
Algorithm  CanSimulAG  (CW,  s,  Ad) 

-  CW  :  is  a  Counterexample  Witness  projection,  s  :  is  a  state  of  CW 
-Ad  :  is  an  LKS 

let  CW  =  (Su  I  nit  i ,  AP1}  Lu  S1}  7\); 
let  Ad  =  (5*2, 1  nit  2,  AP  2,  L2,  S2 ,  T2  )j 

S'  :=  Restrict{S2,  L1(s));  //S'  =  subset  of  S2  with  same  propositional  labeling  as  s 

for  each  s  s'  G  A  //s'  is  a  successor  state  of  s 

S'  :=  CanSimulAG ( C W,  s',  Ad);  / / compute  result  for  successor 

if  (a  7^  r)  then  S"  :=  (s'  G  S2  |  Succ(s',a)  fl  S"  7^  0};  / /take  non-r  pre-image 

S  :=  S  fl  S';  / /update  result 

return  S'; 


9.4.3  Refining  an  Action-Guided  Abstraction 

In  the  previous  section  we  described  an  algorithm  to  check  if  a  Counterexample 
Witness  projection  CW  is  weakly  simulated  by  an  LKS  Ad.  If  this  is  the  case  then  we 
know  that  CW  is  a  valid  Counterexample  Witness  projection.  However,  if  CW  Ad 
then  we  need  to  properly  refine  our  equivalence  relation  R  so  as  to  obtain  a  more 
precise  quotient  LKS  for  the  next  iteration  of  the  CEGAR  loop.  In  this  section  we 
present  an  algorithm  to  refine  R  given  that  CW  Ad. 

We  begin  with  the  notion  of  a  simulation  map.  Intuitively,  given  two  LKSs  Adi 
and  M2,  a  simulation  map  is  a  function  that  maps  each  state  of  Adi  to  a  state  of  Ad2 
which  weakly  simulates  it. 

Definition  31  (Simulation  Map)  Let  Adi  =  {Si,  Initi,  APi,  L1?  E1;  Tf)  and  Ad2  = 
(S2,  Init2,  AP 2,  L2,S2,  T2)  be  two  LIdSs  such  that:  (SMI)  Adi  has  a  tree  structure, 
(SM2)  r  G  E1;  and  (SM3)  r  £  S2.  Then  a  function  9  :  Si  — >  S2  is  said  to  be  a 
simulation  map  between  Adi  and  Ad2  iff  it  obeys  the  following  conditions: 

Vs  G  S1.L1(s)  =  L2(6(s)) 
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Vs  G  Initi .  9(s )  G  Init2 


Ma  G  Si  \  {r}  .  Vs  G  S1 .  Vs'  G  S1 .  s  s'  ==►  0(s)  0(s') 

Vs  G  ^  .  Vs'  G  Si .  s  -U  s'  ==►  9(s)  =  9 (s') 

Clearly,  if  conditions  SM1—SM4  above  are  satisfied,  then  there  exists  a  simulation 
map  9  between  Mi  and  M2  iff  Mi  ^  M2.  Figure  9.1  shows  such  a  simulation  map  9  on 
the  left.  Moreover,  suppose  we  have  another  LKS  M3  =  (S3,  Init3 ,  AP3,  L3,  S3,  T3) 
and  suppose  that  there  is  a  function  v  :  S2  — >  S3  such  that  the  following  conditions 
hold: 

(NU1)  Ms  E  S2 .  L2(s)  =  L3(u(s)) 

(NU2)  Vs  G  Init2  .  u(s)  G  Init3 
(NU3)  MaEY.2.Ms  E  S2.Ms'  E  S2.s  ^  s'  =*  i/(s)  z/(s') 

Then  it  is  obvious  that  the  composition  of  9  and  u,  i.e.,  9  o  u,  is  a  simulation  map 
between  M\  and  M3.  Figure  9.1  shows  such  a  composition  simulation  map  on  the 
right. 
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Figure  9.1:  On  the  left  is  a  simulation  map  9  between  Mi  and  M2.  On  the  right  is  a  simulation 
map  9  o  v  between  M\  and  M3. 
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Let  M  be  an  LKS  obtained  by  predicate  abstraction  from  a  component  and  R  be 
an  equivalence  relation  over  the  states  of  M.  Let  CW  be  a  Counterexample  Witness 
projection  such  that  CW  ^  MR.  Clearly  there  exists  a  simulation  map  9  between 
CW  and  MR,  where  MR  is  the  quotient  LKS  induced  by  M  and  R. 

Now  suppose  that  CW  ^  M.  Then  we  claim  that  there  exists  a  state  [s]  G 
Range  (9)  and  an  outgoing  action  a  from  [s]  such  that  splitting  the  equivalence  class 
[s]  on  the  basis  of  a  (cf.  Section  9.4.1)  will  yield  a  proper  refinement  of  R. 

To  understand  why  this  claim  is  true,  consider  the  converse.  In  other  words 
suppose  that  for  every  [s]  G  Range  (9)  and  for  every  outgoing  action  a  from  [s], 
splitting  [s]  on  the  basis  of  a  does  not  yield  a  proper  refinement.  This  means  that 
every  element  of  [s]  must  have  the  same  set  of  abstract  successors  (cf.  Definition  30) 
on  a.  But  then,  it  follows  that  we  can  define  a  mapping  v  from  Range(9 )  to  the 
states  of  M  which  satisfies  conditions  NU1— NU3  above.  This  would  mean  of  course 
that  9  o  v  would  be  a  simulation  map  from  CW  to  M  which  would  further  imply  that 
CW  ^  M.  This  is  clearly  a  contradiction. 

The  summary  of  the  above  paragraph  is  that  if  CW  M ,  then  there  exists 
a  state  [s]  G  Range  (9)  and  an  outgoing  action  a  from  [s]  such  that  splitting  the 
equivalence  class  [s]  on  the  basis  of  a  will  yield  a  proper  refinement  of  R.  Our 
algorithm  AbsRefineAG  to  refine  R  is  therefore  very  simple.  For  each  equivalence 
class  [s]  G  Range(9),  and  for  each  outgoing  action  a  from  [s],  AbsRefineAG 
attempts  to  split  [s]  on  the  basis  of  a.  AbsRefineAG  stops  as  soon  as  a  proper 
refinement  of  R  is  obtained. 
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9.4.4  Overall  Action-Guided  Abstraction  Refinement 


Our  algorithm  to  check  the  validity  of  CW  at  the  action-guided  abstraction 
level  is  called  ValidateAndRefineAG  and  is  presented  in  Figure  9.3. 

—  . —  Ri 

ValidateAndRefineAG  takes  as  input  a  composition  of  quotient  LKSs  M  =  Adi  I 

____ _ 

•  •  •  ||  Mn  and  a  Counterexample  Witness  CW.  For  each  i  e  {1, . . . ,  n},  it  attempts 

- — -  - — -  Ri 

to  either  verify  that  CW  \  7i  ^  Mt  or  refine  the  abstraction  Adi  ■ 

To  do  this  it  first  invokes  WeakSimulAG  to  check  if  CW  \  7j  ^  Mt.  Is 

WeakSimulAG  returns  true,  it  proceeds  with  the  next  index  i.  Otherwise  it 

invokes  AbsRefineAG  to  construct  a  proper  refinement  of  the  equivalence  relation 

- Ri 

Ri .  ValidateAndRefineAG  returns  true  if  the  some  abstraction  Adi  was  refined 
and  false  otherwise. 


Procedure  9.3  ValidateAndRefineAG  checks  the  validity  of  CW  at  the  action-guided 
abstraction  level.  It  returns  false  if  CW  is  found  to  be  valid.  Otherwise  it  properly  refines 
some  equivalence  relation  R%  and  returns  true. 

Algorithm  ValidateAndRefineAG  (M,  CW) 

-Ad  :  is  a  composition  of  quotient  LKSs 
-  CW  :  is  a  Counterexample  Witness 

- -  - R\  - Rn  - - 

let  Ad  —  Adi  II  • '  ‘  II  Mn  ;  //components  of  Ad 

for  i  —  1  to  n  / /try  to  refine  one  of  the  Ri ’s  using  CW 

if  (^WeakSimulAG (CW  \  7i,  Mfi) 

AbsRefineAG  (Mj  ); 
return  TRUE; 

return  false;  / /none  of  the  quotient  LKSs  was  refined 
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9.5  Two-Level  CEGAR 


Algorithm  TwoLevelCEGAR,  presented  in  Procedure  9.4,  captures  the  complete 
two-level  CEGAR  algorithm  for  simulation  conformance.  It  is  very  similar  to 
SimulCEGAR  other  than  the  invocation  of  ValidateAndRefineAG  to  perform 
Counterexample  Witness  validation  and  refinement  at  the  level  of  action-guided 
abstractions.  We  now  give  a  line-by-line  explanation  of  TwoLevelCEGAR. 


Procedure  9.4  TwoLevelCEGAR  checks  simulation  conformance  between  a  program  V 
and  a  specification  Sp  in  a  context  F. 

Algorithm  TwoLevelCEGAR^,  Sp,  T) 

-  V  :  is  a  program,  T  :  is  a  context  for  V 

-  Sp  :  is  a  specification  LKS 

1:  let  V  =  (Ci, . . . ,  Cn)  and  T  =  (71, . . . ,  yn); 

2:  for  each  i  E  {1, . . . ,  n} 

3:  Mi  :  =  predicate  abstraction  of  C,  with  empty  set  of  predicates; 

4:  Ri  :  =  largest  propositionally  compatible  equivalence  on  M*’s  states; 

5:  Loop: 

- - — -  R\  - Rn 

6:  let  M  =  M1  ||  •  •  •  ||  Mn  ; 

7:  if  (SimulWitness(M,  Sp)  =  “ M  =4  Sp")  return  UV  =4  Sp ”; 

8:  let  CW  =  Counterexample  Witness  returned  by  SimulWitness; 

9:  if  (ValidateAndRefineAG (M,  CW))  goto  Loop; 

10:  find  i  E  {1, . . . ,  n}  such  that  -iWeakSimul(CFF  J  7j,C*,7i); 

11:  if  (no  such  i  found)  return  “V  4  Sp"; 

12:  else  CW i  :=  CW,  U  {CW  \  7*}; 

13:  if  (AbsRefine(CFFj, G, 7^)  =  ERROR)  return  ERROR; 

14:  Mi  :=  AbsRefine(CFFj, G, 7*); 

15:  Ri  :=  largest  propositionally  compatible  equivalence  on  Mi  s  states; 

16:  goto  Loop; 


Let  the  input  program  V  consist  of  the  sequence  of  n  components  (C\,  ...,Cn) 
(line  1).  For  each  i  E  {l,...,n},  (line  2)  TwoLevelCEGAR  constructs  (line  3) 
a  predicate  abstraction  Mi  of  C  with  an  empty  set  of  predicates.  It  also  maintains 
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a  sequence  of  equivalence  relations  (Ri, ...  ,Rn)  such  that  Ri  is  a  relation  over  the 
states  of  Mi.  It  initializes  (line  4)  each  Ri  to  be  the  largest  propositionally  compatible 
equivalence  relation  over  the  states  of  Mt . 

Now  TwoLevelCEGAR  begins  a  loop  (line  5)  where  it  first  constructs  the 

" — -  _  _  - — -  R\  — — -  Rn 

complete  abstract  model  M  (line  6)  by  composing  the  quotient  LKSs  Mi  , . . . ,  Mn 
It  then  checks  (line  7)  whether  M  is  simulated  by  the  specification  Sp.  If  so,  we 
know  that  the  original  programs  V  is  also  simulated  by  Sp  and  TwoLevelCEGAR 
terminates  successfully.  Otherwise  (line  8)  let  CW  be  a  Counterexample  Witness  to 
M  ^  Sp. 

Now  TwoLevelCEGAR  validates  CW  (line  9)  at  the  action-guided  abstraction 
level  by  invoking  algorithm  ValidateAndRefineAG.  If  Validate AndRefine AG 
returns  TRUE  at  line  9  means  that  some  Ri  was  refined.  In  this  case 
TwoLevelCEGAR  repeats  the  loop  from  line  5.  Otherwise  it  attempts  to  refine 
one  of  the  predicate  abstractions.  This  is  done  in  lines  10-16  and  exactly  as  in 
Procedure  SimulCEGAR.  The  final  result  is  either  (i)  the  production  of  a  real 
counterexample  (line  11),  or  (ii)  an  error  report  (line  13),  or  (iii)  a  refinement  of 
some  predicate  abstraction  M;  (line  14-15)  and  the  repetition  of  the  loop  (line  16). 
The  correctness  of  TwoLevelCEGAR  follows  from  that  of  SimulCEGAR  and  of 
ValidateAndRefineAG . 


9.6  Experimental  Results 

Our  experiments  were  carried  out  with  two  broad  goals  in  mind.  The  first  goal 
was  to  compare  the  overall  effectiveness  of  the  proposed  two-level  CEGAR  approach, 
particularly  insofar  as  memory  usage  is  concerned.  The  second  goal  was  to  verify 
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the  effectiveness  of  our  LKS  abstraction  scheme  by  itself.  To  this  end,  we  carried 
out  experiments  on  39  benchmarks,  of  which  26  were  sequential  programs  and  13 
were  concurrent  programs.  Each  example  was  verified  twice,  once  with  only  predicate 
abstraction,  and  once  with  the  full  two-level  algorithm.  Tests  that  used  only  the  low- 
level  predicate  abstraction  refinement  scheme  are  marked  by  PredOnly  in  our  tables. 
In  contrast,  tests  that  also  incorporated  our  action-guided  abstraction  refinement 
procedure  are  marked  by  BothAbst.  Both  schemes  started  out  with  the  same  initial 
sets  of  predicates. 

For  each  experiment  we  measured  several  quantities:  (i)  the  size  of  the  final  state- 
space  on  which  the  property  was  proved/disproved;  note  that,  since  our  abstraction- 
refinement  scheme  produces  increasingly  refined  models,  and  since  we  reuse  memory 
from  one  iteration  to  the  next,  the  size  of  the  final  state-space  is  a  good  indicator 
of  the  maximum  memory  used,  (ii)  the  number  of  predicate  refinement  iterations 
required,  (iii)  the  number  of  action-guided  refinement  iterations  required,  (iv)  the 
total  number  of  refinement  iterations  required,  and  (v)  the  total  time  required.  In  the 
tables  summarizing  our  results,  these  measurements  are  reported  in  columns  named 
respectively  St,  Pit,  Lit,  It  and  T.  For  the  concurrent  benchmarks,  we  also  measured 
actual  memory  requirement  and  report  these  in  the  columns  named  Mem.  Note  that 
predicate  minimization  (cf.  Chapter  7)  was  turned  on  during  all  the  experiments 
described  in  this  section. 

9.6.1  Unix  Kernel  Benchmarks 

The  first  set  of  examples  was  designed  to  examine  how  our  approach  works  on  a 
wide  spectrum  of  implementations.  The  summary  of  our  results  on  these  examples 
is  presented  in  Table  9.1.  We  chose  ten  code  fragments  from  the  Linux  Kernel  2.4.0. 
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LOC 

Description 

PredOnly 

[  BothAbst  | 

St 

It 

T 

St 

It 

T 

27 

pthread_mutexJock  (pthread) 

26 

1 

52 

16 

3 

54 

24 

pthread_mutex_unlock  (pthread) 

27 

1 

51 

13 

2 

56 

60 

socket  (socket) 

187 

3 

1752 

44 

25 

2009 

24 

sock-dlloc  (socket) 

50 

2 

141 

14 

4 

154 

4 

syssend  (socket) 

7 

1 

92 

6 

1 

93 

11 

socksendmsg  (socket) 

23 

1 

108 

14 

3 

113 

27 

modified  pthread-mutexJlock 

23 

1 

59 

14 

2 

61 

24 

modified  pthread-mutex-unlock 

27 

1 

61 

12 

2 

66 

24 

modified  sock_alloc 

47 

1 

103 

9 

1 

106 

11 

modified  socksendmsg 

21 

1 

96 

10 

1 

97 

Table  9.1:  Summary  of  results  for  Linux  Kernel  code.  LOC  and  Description  denote 
the  number  of  lines  of  code  and  a  brief  description  of  the  benchmark  source  code.  The 
measurements  for  Piter  and  Liter  have  been  omitted  because  they  are  insignificant.  All  times 
are  in  milliseconds. 

Corresponding  to  each  code  fragment  we  constructed  a  specification  from  the  Linux 
manual  pages.  For  example,  the  specification  in  the  third  benchmark2  states  that  the 
socket  system  call  either  properly  allocates  internal  data  structures  for  a  new  socket 
and  returns  1,  or  fails  to  do  so  and  returns  an  appropriate  negative  error  value. 


9.6.2  OpenSSL  Benchmarks 

The  next  set  of  examples  was  aimed  at  verifying  larger  pieces  of  code.  Once  again  we 
used  OpenSSL  handshake  implementation  to  design  a  set  of  29  benchmarks.  However, 
unlike  the  previous  OpenSSL  benchmarks,  some  of  these  benchmarks  were  concurrent 
and  comprised  of  both  a  client  and  a  server  component  executing  in  parallel.  The 
specifications  were  derived  from  the  official  SSL  design  documents.  For  example,  the 
specification  for  the  first  concurrent  benchmark  states  that  the  handshake  is  always 
initiated  by  the  client. 

The  first  16  examples  are  sequential  implementations,  examining  different 
2This  benchmark  was  also  used  as  socket-y  in  the  predicate  minimization  experiments  described 
in  the  previous  section. 
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PredOnly 

BothAbst 

Gain 

St(Sl ) 

It 

T 

St(S2) 

Pit. 

Lit 

It 

T 

S1/S2 

597 

4 

141 

114 

7 

193 

200 

180 

5.24 

1038 

10 

191 

114 

11 

275 

286 

210 

9.11 

849 

14 

229 

135 

13 

431 

444 

243 

6.29 

525 

1 

18 

3 

1 

0 

1 

19 

175 

55363 

48 

762 

2597 

32 

4432 

4464 

1813 

21.3 

3672 

14 

256 

930 

14 

1009 

1023 

390 

3.95 

60570 

120 

3388 

636 

8 

508 

516 

274 

95.24 

3600 

14 

251 

750 

11 

662 

673 

322 

4.80 

1242 

19 

222 

186 

16 

463 

479 

226 

6.68 

1029 

18 

246 

252 

18 

978 

996 

303 

4.08 

705 

12 

196 

213 

12 

644 

656 

226 

3.31 

1038 

16 

206 

213 

14 

509 

523 

216 

4.87 

2422 

16 

230 

483 

8 

366 

374 

190 

5.01 

2338 

15 

218 

658 

16 

726 

742 

273 

3.55 

2366 

19 

250 

665 

15 

716 

731 

269 

3.56 

2422 

20 

257 

609 

15 

710 

725 

274 

3.98 

Table  9.2:  Summary  of  results  for  sequential  OpenSSL  examples.  The  first  eight  are  server 
benchmarks  while  the  last  eight  are  client  benchmarks.  Note  that  for  the  PredOnly  case,  Lit 
is  always  zero  and  Pit  =  It.  All  times  are  in  seconds.  The  improvement  in  state-space  size  is 
shown  in  bold. 

properties  of  SrvrCode  and  ClntCode  separately.  Each  of  these  examples  contains 
about  350  comment-free  LOC.  The  results  for  these  are  summarized  in  Table  9.2. 
The  remaining  13  examples  test  various  properties  of  SrvrCode  and  ClntCode 
when  executed  together.  These  examples  are  concurrent  and  consist  of  about  700 
LOC.  The  results  for  them  are  summarized  in  Table  9.3.  All  OpenSSL  benchmarks 
other  than  the  seventh  server  benchmark  passed  the  property. 

In  terms  of  state-space  size,  the  two-level  refinement  scheme  outperforms  the  one- 
level  scheme  by  factors  of  up  to  175.  The  fourth  server  benchmark  shows  particular 
improvement  with  the  two-level  approach.  In  this  benchmark,  the  property  holds  on 
the  very  initial  abstraction,  thereby  requiring  no  refinement  and  letting  us  achieve 
maximum  reduction  in  state-space.  The  two-level  approach  is  also  an  improvement  in 
terms  of  actual  memory  usage,  particularly  for  the  concurrent  benchmarks.  In  most 
instances  it  reduces  the  memory  requirement  by  over  an  order  of  magnitude. 

Finally,  the  two-level  approach  is  also  faster  on  most  of  the  concurrent  benchmarks. 
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PredOnly 

BothAbst 

Gain 

St 

It 

T 

Mem(Ml ) 

St(S2) 

Pit 

Lit 

It 

T 

Mem(M2) 

M1/M2 

157266 

12 

886 

1023 

15840 

13 

742 

755 

1081 

122 

8.39 

201940 

18 

1645 

1070 

6072 

10 

547 

557 

500 

64 

16.72 

203728 

12 

1069 

1003 

20172 

13 

908 

921 

1805 

130 

7.72 

201940 

17 

1184 

640 

7808 

11 

439 

450 

482 

69 

9.28 

184060 

16 

1355 

780 

6240 

8 

384 

392 

407 

64 

12.19 

158898 

11 

695 

426 

2310 

5 

195 

200 

219 

56 

7.61 

103566 

10 

447 

250 

7743 

11 

513 

524 

472 

74 

3.38 

161580 

14 

1071 

945 

4617 

11 

464 

475 

387 

64 

14.77 

214989 

13 

1515 

1475 

13800 

8 

471 

479 

716 

106 

13.92 

118353 

10 

628 

663 

3024 

12 

550 

562 

402 

60 

11.05 

204708 

8 

794 

1131 

8820 

5 

306 

311 

446 

79 

14.32 

121170 

5 

303 

373 

2079 

5 

152 

157 

204 

56 

6.66 

152796 

12 

579 

361 

3780 

10 

404 

414 

349 

60 

6.02 

Table  9.3:  Summary  of  results  for  concurrent  OpenSSL  examples.  Note  that  for  the  PredOnly 
case,  Lit  is  always  zero  and  Pit  =  It.  All  times  are  in  seconds  and  memory  is  in  MB.  Best  times 
and  the  improvement  in  memory  requirement  is  shown  in  bold. 

In  many  instances  it  achieves  a  speedup  by  a  factor  of  over  three  when  compared  to 
the  one-level  scheme.  The  savings  in  time  and  space  for  the  concurrent  examples  are 
significantly  higher  than  for  the  sequential  ones.  We  expect  the  two-level  approach 
to  demonstrate  increasingly  improved  performance  with  the  number  of  concurrent 
components  in  the  implementation. 
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Chapter  10 


Deadlock 


In  this  chapter,  we  present  an  algorithm  to  detect  deadlocks  in  concurrent  message- 
passing  programs.  Even  though  deadlock  is  inherently  non-compositional  and  its 
absence  is  not  preserved  by  standard  abstractions,  our  framework  employs  both 
abstraction  and  compositional  reasoning  to  alleviate  the  state  space  explosion  problem. 
We  iteratively  construct  increasingly  more  precise  abstractions  on  the  basis  of  spurious 
counterexamples  to  either  detect  a  deadlock  or  prove  that  no  deadlock  exists.  Our 
approach  is  inspired  by  the  counterexample  guided  abstraction  refinement  paradigm. 
However,  our  notion  of  abstraction  as  well  as  our  schemes  for  verification  and 
abstraction  refinement  differ  in  key  respects  from  existing  abstraction  refinement 
frameworks.  Our  algorithm  is  also  compositional  in  that  abstraction,  counterexample 
validation,  and  refinement  are  all  carried  out  component-wise  and  do  not  require  the 
construction  of  the  complete  state  space  of  the  concrete  system  under  consideration. 
Finally,  our  approach  is  completely  automated  and  provides  diagnostic  feedback  in 
case  a  deadlock  is  detected.  We  have  implemented  our  technique  in  the  magic 
verification  tool  and  present  encouraging  results  (up  to  20  times  speed-up  in  time 
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and  4  times  less  memory  consumption)  with  concurrent  message-passing  C  programs. 
We  also  report  a  bug  in  the  real-time  operating  system  yuC/OS-II. 


10.1  Introduction 

Ensuring  that  standard  software  components  are  assembled  in  a  way  that  guarantees 
the  delivery  of  reliable  services  is  an  important  task  for  system  designers.  Certifying 
the  absence  of  deadlock  in  a  composite  system  is  an  example  of  a  stringent  requirement 
that  has  to  be  satisfied  before  the  system  can  be  deployed  in  real  life.  This  is  especially 
true  for  safety-critical  systems,  such  as  embedded  systems  or  controllers,  that  are 
expected  to  always  service  requests  within  a  fixed  time  limit  or  be  responsive  to 
external  stimuli.  In  addition,  many  formal  analysis  techniques,  such  as  temporal 
logic  model  checking  [32,39],  assume  that  the  systems  being  analyzed  are  deadlock- 
free.  In  order  for  the  results  of  such  analysis  to  be  valid,  one  usually  needs  to  establish 
deadlock  freedom  separately.  Last  but  not  least,  in  case  a  deadlock  is  detected,  it 
is  highly  desirable  to  be  able  to  provide  system  designers  and  implemented  with 
appropriate  diagnostic  feedback. 

ffowever,  despite  significant  efforts,  validating  the  absence  of  deadlock  in  systems 
of  realistic  complexity  remains  a  major  challenge.  The  problem  is  especially  acute  in 
the  context  of  concurrent  programs  that  communicate  via  mechanisms  with  blocking 
semantics,  e.g.,  synchronous  message-passing  and  semaphores.  The  primary  obstacle 
is  the  well-known  state  space  explosion  problem  whereby  the  size  of  the  state  space 
of  a  concurrent  system  increases  exponentially  with  the  number  of  components.  Two 
paradigms  are  usually  recognized  as  being  the  most  effective  against  the  state  space 
explosion  problem:  abstraction  and  compositional  reasoning.  Even  though  these  two 
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approaches  have  been  widely  studied  in  the  context  of  formal  verification  [36,  64, 67, 
84],  they  find  much  less  use  in  deadlock  detection.  This  is  possibly  a  consequence 
of  the  fact  that  deadlock  is  inherently  non-compositional  and  its  absence  is  not 
preserved  by  standard  abstractions  (see  Example  20).  Therefore,  a  compositional 
and  abstraction-based  deadlock  detection  scheme,  such  as  the  one  we  present  in  this 
chapter,  is  especially  significant. 

Counterexample  guided  abstraction  refinement  [76]  (CEGAR  for  short)  is  a 
methodology  that  uses  abstraction  in  an  automated  manner  and  has  been  successful 
in  verifying  real-life  hardware  [37]  and  software  [6]  systems.  A  CEGAR-based 
scheme  iteratively  computes  more  and  more  precise  abstractions  (starting  with  a 
very  coarse  one)  of  a  target  system  on  the  basis  of  spurious  counterexamples  until 
a  real  counterexample  is  obtained  or  the  system  is  found  to  be  correct.  The 
approach  presented  in  this  chapter  combines  both  abstraction  and  compositional 
reasoning  within  a  CEGAR-based  framework  for  verifying  the  absence  of  deadlocks 
in  concurrent  message-passing  systems.  More  precisely,  suppose  we  have  a  system 
M  composed  of  components  Mi, .. .  ,Mn  executing  concurrently.  Then  our  technique 
checks  for  deadlock  in  M  using  the  following  three-step  iterative  process: 

1.  Abstract.  Create  an  abstraction  M  such  that  if  M  has  a  deadlock,  then  so 
does  M.  This  is  done  component- wise  without  having  to  construct  the  full  state 
space  of  M. 

2.  Verify.  Check  if  M  has  a  deadlock.  If  not,  report  absence  of  deadlock  in  M 
and  exit.  Otherwise  let  n  be  a  counterexample  that  leads  to  a  deadlock  in  M. 

3.  Refine.  Check  if  n  corresponds  to  a  deadlock  in  M .  Once  again  this  is  achieved 
component-wise.  If  n  corresponds  to  a  real  deadlock,  report  presence  of  deadlock 


187 


in  M  along  with  appropriate  diagnostic  feedback  and  exit.  Otherwise  refine  M 
on  the  basis  of  n  to  obtain  a  more  precise  abstraction  and  repeat  from  step  1. 

In  our  approach,  systems  as  well  as  their  components  are  represented  as  LKSs. 
Note  that  only  the  verification  stage  (step  2)  of  our  technique  requires  explicit 
composition  of  systems.  All  other  stages  can  be  performed  one  component  at  a  time. 
Since  verification  is  performed  only  on  abstractions  (which  are  usually  much  smaller 
than  the  corresponding  concrete  systems),  this  technique  is  able  to  significantly  reduce 
the  state  space  explosion  problem.  Finally,  when  a  deadlock  is  detected,  our  scheme 
provides  useful  diagnostic  feedback  in  the  form  of  counterexamples. 

To  the  best  of  our  knowledge,  this  is  the  first  counterexample  guided, 
compositional  abstraction  refinement  scheme  to  perform  deadlock  detection  on 
concurrent  systems.  We  have  implemented  our  approach  in  our  C  verification  tool 
MAGIC  [80]  which  extracts  LKS  models  from  C  programs  automatically  via  predicate 
abstraction  [24,  63] .  Our  experiments  with  a  variety  of  benchmarks  have  yielded 
encouraging  results  (up  to  20  times  speed-up  in  time  and  4  times  less  memory 
consumption).  We  have  also  discovered  a  bug  in  the  real-time  operating  system 

/zC/OS-II. 

The  rest  of  this  chapter  is  organized  as  follows.  In  Section  10.2  we  summarize 
related  work.  This  is  followed  by  some  preliminary  definitions  and  results  in 
Section  10.3.  In  Section  10.4  we  present  our  abstraction  scheme,  followed  by 
counterexample  validation  and  abstraction  refinement  in  Section  10.5  and  Section  10.6 
respectively.  Our  overall  deadlock  detection  algorithm  is  described  in  Section  10.7. 
Finally,  we  present  experimental  results  in  Section  10.8. 
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10.2  Related  Work 


The  formalization  of  a  general  notion  of  abstraction  first  appeared  in  [47].  The 
abstractions  used  in  our  approach  are  conservative.  They  are  only  guaranteed  to 
preserve  safety  properties  of  the  system  (e.g.,  [36,75]).  Conservative  abstractions 
usually  lead  to  significant  reductions  in  the  state  space  but  in  general  require 
an  iterated  abstraction  refinement  mechanism  (such  as  CEGAR)  in  order  to 
establish  specification  satisfaction.  CEGAR  has  been  used,  among  others,  in  non- 
automated  [90],  and  automated  [6,29,40,66,77,98]  forms. 

CEGAR-based  schemes  have  been  used  for  the  verification  of  both  safety  [6, 24,  37, 
66]  (i.e.,  reachability)  and  liveness  [28]  properties.  Compositionality  has  been  most 
extensively  studied  in  process  algebra  (e.g.,  [69,85, 100]),  particularly  in  conjunction 
with  abstraction.  Abstraction  and  compositional  reasoning  have  been  combined  [23] 
within  a  single  two-level  CEGAR  scheme  to  verify  safety  properties  of  concurrent 
message-passing  C  programs.  None  of  these  techniques  attempt  to  detect  deadlock. 
In  fact,  the  abstractions  used  in  these  schemes  do  not  preserve  deadlock  freedom  and 
hence  cannot  be  used  directly  in  our  approach. 

Deadlock  detection  has  been  widely  studied  in  various  contexts.  One  of  the 
earliest  deadlock-detection  tools,  for  the  process  algebra  CSP,  was  FDR  [60];  see 
also  [17,82,83,100,101].  Corbett  has  evaluated  various  deadlock-detection  methods 
for  concurrent  systems  [45]  while  Demartini  et.  al.  have  developed  deadlock-detection 
tools  for  concurrent  Java  programs  [53].  However,  to  the  best  of  our  knowledge,  none 
of  these  approaches  involve  abstraction  refinement  or  compositionality  in  automated 
form. 
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10.3  Background 


Recall,  from  Definition  1,  than  an  LKS  is  a  6-tuple  (S,  Init,  AP,  L,E,  T).  In  this 
section,  we  present  some  additional  preliminary  definitions  and  results  (many  of  which 
originate  from  CSP  [69, 100])  that  are  used  in  the  rest  of  the  chapter.  In  this  chapter, 
we  will  only  concern  ourselves  with  finite  paths  and  traces  (which  will  be  usually 
represented  with  the  letters  n  and  9  respectively).  The  deadlocking  behavior  of  a 
system  is  dependent  purely  on  the  communication  between  its  components.  Since 
communication  in  our  framework  is  based  purely  on  actions,  our  notion  of  a  trace  will 
ignore  atomic  propositions.  We  now  define  paths  and  traces  formally. 

Definition  32  (Finite  Path  and  Trace)  Let  M  =  (S',  Init,  AP ,  L,E,  T)  be  an 
LKS.  A  finite  path  of  M  is  a  finite  sequence  (so,  ao,  si,  ai, . . . ,  an_i,  sn)  such  that: 
(i)  so  €  Init  and  (ii)  VO  <  i  <  n .  st  -%■  si+i.  In  such  a  case,  the  finite  sequence 
(a0,  oi, . . . ,  an_i)  is  called  a  finite  trace  of  M. 

Let  M  =  (S',  Init ,  AP,  L,  S,  T )  be  any  LKS.  We  denote  the  set  of  all  paths  of  M 
by  Path(M).  A  state  s  of  M  is  said  to  refuse  an  action  a  iff  Succ(s,a )  =  0.  The 
refusal  of  a  state  is  the  set  of  all  actions  that  it  refuses.  Suppose  6  G  E*  is  a  finite 
sequence  of  actions  and  F  C  E  is  a  set  of  actions.  Then  ( 6 ,  F )  is  said  to  be  a  failure 
of  M  iff  M  can  participate  in  the  sequence  of  actions  9  and  then  reach  a  state  whose 
refusal  is  F .  Finally,  M  has  a  deadlock  iff  it  can  reach  a  state  which  refuses  the  entire 
alphabet  S.  We  now  present  these  notions  formally. 

Definition  33  (Refusal)  Let  M  =  (S',  Init,  AP ,  L,E,  T)  be  an  LKS.  Then  the 
function  Ref  :  S  — >  2s  is  defined  as  follows: 

Ref(s)  =  (a  G  S  Succ(s,  a)  =  0} 
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Definition  34  (Failure)  Let  M  =  ( S ,  Init,  AP ,  L,  E,  T)  6e  an  LKS.  A  pair  ( 6 ,  F )  G 
E*  x  2s  is  a  failure  of  M  iff  the  following  condition  holds:  if  9  =  (ao, . . . ,  an_i),  then 
ihere  exist  states  s0,  Si, . . . ,  sn  such  that  (i)  (s0,  a0,  Si,  ai, . . . ,  an_i,  s„)  G  Path(M) 
and  (ii)  F  =  Ref(sn).  We  write  Fail(M)  to  denote  the  set  of  all  failures  of  M. 

Definition  35  (Deadlock)  An  LKS  M  =  (S,  Init,  AP,  L,  E,  T)  is  said  to  have  a 
deadlock  iff  ( 9 ,  E)  e  Fail(M)  for  some  9  G  E*. 

Example  19  Figure  10.1(a)  shows  two  LKSs  Mi  =  (5*!,  Initi,  AP1,  Lx,  Ei,  Tf) 
and  M2  =  (S2,  Init2,  AP2,  L2,Ti2,  T2).  Let  Ex  =  {a,b,c}  and  S2  = 

{a,b',c}.  Then  Mi  has  seven  paths:  ( P ),  ( P,a,Q ),  ( P,a,R ),  {P,a,Q,h,  S), 
( P,a,R,b,S ),  (P,a,Q,b,S,c,T),  and  (P,a,  R,b,  S,c,T) .  It  has  four  traces:  (),  (a), 
( a,b ),  and  ( a,b,c ),  and  four  failures  ((),{&,  c}),  ((a) ,  (a,  c}),  ((a,  b) ,  (a,  b}),  and 
({a,b,c) ,  {a,  b,c}).  Hence  Mi  has  a  deadlock.  Also,  M2  has  four  paths,  four  traces, 
four  failures  and  a  deadlock.  Finally,  Figure  10.1(b)  shows  the  LKS  Mi  ||  M2  where 
Mi  and  M2  are  the  LKSs  shown  in  Figure  10.1(a). 

Given  a  trace  of  a  concurrent  system  My,  one  can  construct  projections  by 
restricting  the  trace  to  the  alphabets  of  each  of  the  components  of  Mu.  In  the 
following,  we  will  write  6\  •  02  to  denote  the  concatenation  of  two  sequences  9\  and 
02- 

Definition  36  (Projection)  Consider  LKSs  Mi, . . . ,  Mn  with  alphabets  E1; . . . ,  En 
respectively.  Let  My  =  Mx  ||  •••  ||  Mn  and  let  us  denote  the  alphabet  of  My  by  Ey. 
Then  for  1  <  i  <  n,  the  projection  function  Projj  :  EJ|  —>  E*  is  defined  inductively  as 
follows.  We  will  write  6  \  i  to  mean  ProjfO): 

1 ■  <>Ji=<>. 
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Figure  10.1:  (a)  Sample  LKSs  M \  and  M2;  (b)  M\  ||  M2. 

2.  If  a  G  Ej  then  ((a)  •  6)  \  i  —  (a)  •  (6  \  i). 

3.  If  a  qL  Ej  then  ((a)  •  0)  J  i  =  6  \  i. 

Definition  5  for  the  parallel  composition  of  LKSs  and  Definition  36  immediately 
lead  to  the  following  theorem,  which  essentially  highlights  the  compositional  nature 
of  failures.  Its  proof,  as  well  as  the  proofs  of  related  results,  are  well-known  [100]. 

Theorem  26  Let  Mi, . . .  ,Mn  be  LKSs  and  let  My  =  M\  ||  •••  ||  Mn.  Then 
( 9,F )  G  Fail(M\\)  iff  there  exist  refusals  Fi,,..,Fn  such  that:  (i)  F  =  [JILi  Fi, 
and  (ii)  for  1  <  i  <  n,  (6  \  i,  Ff)  G  Fail(Mi). 
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10.4  Abstraction 


In  this  section  we  present  our  notion  of  abstraction.  Once  again,  we  employ  quotient 
LKSs  as  abstractions  of  concrete  LKSs.  Recall  that  given  a  concrete  LKS  M,  one  can 
obtain  a  quotient  LKS  as  follows.  The  states  of  the  quotient  LKS  are  obtained  by 
grouping  together  states  of  M ;  alternatively,  one  can  view  these  groups  as  equivalence 
classes  of  some  equivalence  relation  on  the  set  of  states  of  M.  Transitions  of  the 
quotient  LKS  are  defined  existentially.  We  now  present  a  formal  definition  of  these 
concepts. 

Definition  37  (Quotient  LKS)  Let  M  =  (S,  Init,  AP,  L,  E,  T )  be  an  LKS  and 
R  C  S  x  S  be  an  equivalence  relation.  For  an  arbitrary  s  G  S  we  let  [s]R 
denote  the  equivalence  class  of  s.  M  and  R  then  induce  a  quotient  LKS  MR  = 
(, SR ,  InitR ,  APr,  Lr,  Tr )  where:  (i)  SR  =  {[s]R  |  s  G  S},  (ii)  InitR  =  {[s]H  |  s  G 

Init},  (m)  APR  =  AP,  (w)  V[s]fi  G  SR .  LR([s}R)  =  LU[s]h  L(s'),  (v)  =  S,  and 

=  {(H  V,  m  I  (w')  e  r}. 

Note  that  the  crucial  difference  between  Definition  28  and  Definition  37  is  that  in 
the  latter  we  do  not  require  equivalence  relations  to  be  propositionally  compatible. 
Instead  we  let  the  set  of  propositions  labeling  a  state  [s]R  of  MR  be  simply  the  union  of 
the  propositions  labeling  the  states  of  M  belonging  to  the  equivalence  class  [s] R .  This 
definition  is  somewhat  arbitrary  but  suffices  for  our  deadlock  detection  framework 
since  propositions  do  not  play  any  role  in  the  deadlocking  behavior  of  a  system. 

As  usual,  we  write  [s]  to  mean  [s]R  when  R  is  clear  from  the  context.  MR  is  often 
called  an  existential  abstraction  of  M.  The  states  of  M  are  referred  to  as  concrete 
states  while  those  of  MR  are  called  abstract  states.  We  will  often  use  a  to  represent 
abstract  states,  and  continue  to  denote  concrete  states  with  s.  The  following  result 
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concerning  quotient  LKSs  is  well-known  [39]. 


Theorem  27  Let  M  =  (S,  I  nit,  AP ,  L,£,  T)  be  an  LKS,  R  an  equivalence  relation 
on  S,  and  MR  the  quotient  LKS  induced  by  M  and  R.  If  (s0,  a0,  s 4,  a4, . . . ,  an_ i,  sn)  G 
Path(M),  then  ([s0],a0,  [si],a4, . . . ,  an_ i,  [sn])  G  Path(MR). 

Example  20  Note  the  following  facts  about  the  LKSs  in  Figure  10.2:  (i)  M\  and  M2 
both  have  deadlocks  but  Mi  ||  M2  does  not;  (ii)  neither  M3  nor  M4  has  a  deadlock  but 
M3  ||  M4  does;  (Hi)  M\  has  a  deadlock  and  M3  does  not  have  a  deadlock  but  M\  ||  M3 
has  a  deadlock;  (iv)  Mi  has  a  deadlock  and  M4  does  not  have  a  deadlock  but  Mi  ||  M4 
does  not  have  a  deadlock;  (v)  Mi  has  a  deadlock  but  the  quotient  LKS  obtained  by 
grouping  all  the  states  of  Mi  into  a  single  equivalence  class  does  not  have  a  deadlock. 


M3  M4 


Figure  10.2:  Four  sample  LKSs  demonstrating  the  non-compositional  nature  of  deadlock. 

As  Example  20  highlights,  deadlock  is  non-compositional  and  its  absence  is 
not  preserved  by  existential  abstractions  (nor  in  fact  is  it  preserved  by  universal 
abstractions).  So  far  we  have  presented  well-known  definitions  and  results  to  prepare 
the  background.  We  now  present  what  constitute  the  core  technical  contributions  of 
this  chapter. 

We  begin  by  taking  a  closer  look  at  the  non-preservation  of  deadlock  by  existential 
abstractions.  Consider  a  quotient  LKS  MR  and  a  state  [s]  of  MR.  It  can  be  proved 
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that  Ref([s])  =  fls'efs]  ^ef(s')-  In  other  words,  the  refusal  of  an  abstract  state  [s] 
under-approximates  the  refusals  of  the  corresponding  concrete  states,  ffowever,  in 
order  to  preserve  deadlock  we  require  that  refusals  be  over- approximated.  We  achieve 
this  by  taking  the  union  of  the  refusals  of  the  concrete  states.  This  leads  to  the  notion 
of  an  abstract  refusal,  which  we  now  define  formally. 

Definition  38  (Abstract  Refusal)  Let  M  =  (S,  I  nit,  AP ,  L,£,  T)  be  an  LKS, 
R  C  S  x  S  be  an  equivalence  relation,  and  MR  be  the  quotient  LKS  induced  by 
M  and  R.  Let  SR  be  the  set  of  states  of  Mn .  Then  the  abstract  refusal  function 
Re  f  :  SR  — >  2s  is  defined  as  follows: 

Ref (a)  =  [J  Ref{s) 

sEct 

For  a  parallel  composition  of  quotient  LKSs,  we  extend  the  notion  of  abstract  refusal 
as  follows.  Let  MRl, . . . ,  MRn  be  quotient  LKSs.  Let  a  =  (au, . . . ,  an)  be  a  state  of 
MRl  ||  •  •  •  ||  MR Then  R^f(a)  =  U”=i  Rtf  (on)  ■ 

Next,  we  introduce  the  notion  of  abstract  failures,  which  are  similar  to  failures, 
except  that  abstract  refusals  are  used  in  place  of  refusals. 

Definition  39  (Abstract  Failure)  Let  M  =  (S,  Init,  AP ,  L,  S,  T)  be  an  LKS  for 
which  abstract  refusals  are  defined  (i.e.,  M  is  either  a  quotient  LKS  or  a  parallel 
composition  of  such).  A  pair  ( 6,F )  €  £*  x  2s  is  said  to  be  an  abstract  failure  of  M 
iff  the  following  condition  holds:  if  6  =  (a0, . . . ,  an_ i),  then  there  exist  a0,  ai,...,an 
such  that  (i)  (a0,  a0,  «i,  Oi,  •  •  • ,  an-i,  otn)  G  Path(M)  and  (ii)  F  =  Ref(an).  We 
write  AbsFail(M)  to  denote  the  set  of  all  abstract  failures  of  M. 

The  following  theorem  essentially  states  that  the  failures  of  an  LKS  M  are  always 
subsumed  by  the  abstract  failures  of  its  quotient  LKS  MR. 
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Theorem  28  Let  M  —  (S,  I  nit,  AP ,  L,  E,  T )  be  an  LKS,  R  C  SxS  be  an  equivalence 
relation,  and  MR  be  the  quotient  LKS  induced  by  M  and  R.  Then  for  all  [6,  F)  G 
Fail(M),  there  exists  F'  D  F  such  that  ( 6,F' )  G  AbsFail(MR). 

Proof.  Here  is  a  proof  sketch.  Let  6  =  (do, . . . ,  an_i). 

1.  From  (0,F)  G  Fail(M)  and  Definition  34:  let  (s0,  a0,  si,  a\, . . . ,  an_i,  sn)  G 
Path(M)  such  that  F  =  Ref(sn). 

2.  From  1  and  Proposition  27:  ([so],  Oo,  [si],  a\, . . . ,  an_i,  [sn])  G  Path(MR). 

3.  From  2  and  Dehnition  39:  (6,  Ref([sn]))  G  AbsFail(MR). 

4.  From  Dehnition  38:  Ref([sn})  D  Ref(sn). 

5.  From  3,  4  and  using  F'  =  Ref([sn])  we  get  our  result. 


□ 

As  the  following  two  theorems  show,  abstract  failures  are  compositional,  In  other 
words,  the  abstract  failures  of  a  concurrent  system  My  can  be  decomposed  naturally 
into  abstract  failures  of  the  components  of  Mu .  Proofs  of  Theorem  29  and  Theorem  30 
follow  the  same  lines  as  Theorem  26. 

Theorem  29  Let  MR] , . . . ,  MRn  be  quotient  LKSs,  and  (cco,  oo>  •  •  • ,  Ofe-i,  cxk)  £ 
Path(MRl  ||  •••  ||  MRn ).  Let  the  trace  6  =  (do,...  ,  afc_i)  and  the  filial  state 
ak  —  (al->  ■  ■  ■  ■>  ak)-  Then  for  1  <  i  <  n,  (6  \  i,  Ref(alk))  G  AbsFail(MRi). 

Theorem  30  Let  MRl , ,  MRn  be  quotient  LKSs.  Then  (0,F)  G  AbsFail{MRl  || 

•  •  •  ||  MRn)  iff  there  exist  abstract  refusals  F\, ,  Fn  such  that:  (i)  F  =  IJ"=i  T%,  and 
(ii)  for  1  <  i  <  n,  (6  J  i,Ff)  G  AbsFail(MRi). 
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In  the  rest  of  this  chapter  we  often  make  implicit  use  of  the  following 
facts.  Consider  LKSs  M 1 ,  Mn  with  alphabets  E^-.^E^  respectively.  Let 
M^1 , . . . ,  Mffn  be  quotient  LKSs.  Let  us  denote  the  alphabet  oi  Mi  ||  ||  Mn 

by  E ||  and  the  alphabet  of  M^1  ||  ||  Mffn  by  Ey.  Then  Ey  =  Ey.  This  follows 

directly  from  the  fact  the  alphabet  of  is  E?;  for  1  <  i  <  n.  The  notion  of  abstract 
failures  leads  naturally  to  the  notion  of  abstract  deadlocks. 

Definition  40  (Abstract  Deadlock)  Let  M^1 , . . . ,  n  be  quotient  LKSs  and 
My  =  Mi 1  ||  ■■■  ||  M ,fn.  Let  E  be  the  alphabet  of  My.  Then  My  is  said  to  have 
an  abstract  deadlock  iff  ( 6 ,  E)  e  AbsFail{M\\)  for  some  6  G  E* . 

Let  M i1, . . . ,  M^n  be  quotient  LKSs  and  My  =  Mf1  ||  •  •  •  ||  Mffn  with  alphabet  E. 
Clearly,  My  has  an  abstract  deadlock  iff  there  exists  a  path  (a0,  a0,  aq,  cq, . . . ,  an-i,an) 
of  My  such  that  Ref(an)  =  E.  We  call  such  a  path  a  counterexample  to  abstract 
deadlock  freedom,  or  simply  an  abstract  counterexample.  It  is  easy  to  devise 
an  algorithm  to  check  whether  My  has  an  abstract  deadlock  and  also  generate  a 
counterexample  in  case  an  abstract  deadlock  is  detected.  We  call  this  algorithm 
AbsDeadlock. 

Suppose  the  alphabet  of  My  if  E.  Then  AbsDeadlock  explores  the  reachable 
states  of  My  in,  say,  breadth- first  manner.  For  each  state  a  of  My,  it  checks  if 
Ref  (a)  =  E.  If  so,  it  generates  a  counterexample  from  an  initial  state  of  My  to  a, 
reports  “ abstract  deadlock'’’  and  terminates.  If  no  state  a  with  Ref  (a)  =  E  can  be 
found,  it  reports  “no  abstract  deadlock"  and  terminates.  Since  My  has  a  finite  number 
of  states  and  transitions,  AbsDeadlock  always  terminates  with  the  correct  answer. 

The  following  lemma  shows  that  abstract  deadlock  freedom  in  the  composition 
of  quotient  LKSs  entails  deadlock  freedom  in  the  composition  of  the  corresponding 
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concrete  LKSs. 


Lemma  1  Let  Mi, ... ,  Mn  be  LKSs  and  Rx, ...  ,Rn  be  equivalence  relations  on  the 
state  of  Mi,...,  Mn  respectively.  If  M,Rl  ||  •••  ||  Mffn  does  not  have  an  abstract 
deadlock  then  Mi  ||  •  •  •  ||  Mn  does  not  have  a  deadlock. 

Proof.  It  suffices  to  prove  the  contrapositive.  Let  us  denote  Mi  ||  ■  •  •  ||  Mn  by  Mu 
and  M^1  ||  •  •  •  ||  Mffn  by  My.  We  know  that  My  and  My  have  the  same  alphabet. 
Let  this  alphabet  be  E.  Now  suppose  My  has  a  deadlock. 

1.  By  Definition  35:  (0,E)  €  F  ail(M\\)  for  some  9. 

2.  From  1  and  Theorem  26:  there  exist  Fy, . . . ,  Fn  such  that:  (i)  1J"=1  F-L  =  E  and 
(ii)  for  1  <  i  <  n,  (6  J  i,Ff)  e  Fail{Mf). 

3.  From  2(ii)  and  Theorem  28:  for  1  <  i  <  n,  3 F[  D  Fi  such  that  ( 6  \  i,F[ )  e 
AbsFail(MtRi). 

4.  From  2(i)  and  3:  \fi=%  F[  D  (J”=1  F%  =  E. 

5.  From  3,  4  and  Theorem  30:  [9,  E)  G  AbsFail{M\\). 

6.  From  5  and  Definition  40:  My  has  an  abstract  deadlock. 

□ 

Unfortunately,  the  converse  of  Lemma  1  does  not  hold  (a  counterexample  is 
not  difficult  to  find  and  we  leave  this  task  to  the  reader).  Suppose  therefore  that 
AbsDeadlock  reports  an  abstract  deadlock  for  MRl  ||  ■  •  •  ||  Mffn  along  with  an 
abstract  counterexample  7 r.  We  must  then  decide  whether  n  also  leads  to  a  deadlock 
in  Mi  ||  •  •  •  ||  Mn  or  not.  This  process  is  called  counterexample  validation  and  is 
presented  in  the  next  section. 
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10.5  Counterexample  Validation 


In  this  section  we  present  our  approach  to  check  the  validity  of  an  abstract 
counterexample  returned  by  AbsDeadlock.  We  begin  by  defining  the  notion  of 
valid  counterexamples. 

Definition  41  (Valid  Counterexample)  Let  M-f1 , . . . ,  Mfin  be  quotient  LKSs 
and  let  n  =  (a0,  •  •  • ,  Ufe-i,  otk)  be  an  abstract  counterexample  returned  by 

AbsDeadlock  on  M-f1  ||  ■  •  •  ||  Mffn .  Let  the  trace  0  =  (a  o, . . . ,  a*,-  i)  arid  the  final 
state  =  {afi, . . . ,  off).  We  say  that  n  is  a  valid  counterexample  iff  for  1  <  i  <  n, 
(9  J  i,  Ref(ai))  E  FaiffMf). 

A  counterexample  is  said  to  be  spurious  iff  it  is  not  valid.  Let  M  be  an  arbitrary 
LKS  with  alphabet  E,  6  E  E*  be  a  trace,  and  F  C  E  be  a  refusal.  It  is  easy  to  design 
an  algorithm  that  takes  M,  9,  and  F  as  inputs  and  returns  true  if  ( 9 ,  F)  E  Fail(M) 
and  false  otherwise.  We  call  this  algorithm  IsFailure  and  give  its  pseudo-code 
in  Procedure  10.1.  Starting  with  the  initial  state,  IsFailure  repeatedly  computes 
successors  for  the  sequence  of  actions  in  9.  If  the  set  of  successors  obtained  at  some 
point  during  this  process  is  empty,  then  ( 9,F )  ^  Fail(M)  and  IsFailure  returns 
FALSE.  Otherwise,  if  X  is  the  set  of  states  obtained  after  all  actions  in  9  have  been 
processed,  then  (6,F)  G  Fail(M)  iff  there  exists  s  E  X  such  that  Ref(s)  =  F.  The 
correctness  of  IsFailure  should  be  clear  from  Definition  34. 

Lemma  2  Let  M^1, . . . ,  Mfin  be  quotient  LKSs  and  let  n  be  an  abstract 
counterexample  returned  by  AbsDeadlock  on  M^1  ||  •••  ||  Mfin .  If  n  is  a  valid 
counterexample  then  Mi  ||  •  •  •  ||  Mn  has  a  deadlock. 
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Procedure  10.1  IsFailure  returns  true  if  ( 6,F )  G  Fail(M)  and  false  otherwise. 
Algorithm  IsFailure(M,  9 ,  F ) 

-  M  :  is  an  LKS,  6  :  is  a  trace  of  M,  F  :  is  a  set  of  actions  of  M 
let  M  =  (S,  Init,  AP,  L,  E,  T )  and  6  =  (ao, . . . ,  an_ i); 

A  :=  Init ; 

/ /  simulate  9  on  M 

for  i  :  =  0  to  n  —  1  do  A  :=  (JsGX  Succ(s,  a*); 

/ /  simulation  complete,  now  check  if  one  of  the  end  states  refuses  F 
return  3s  G  A .  Ref(s)  =  F; 


Proof.  Let  us  denote  Mi  ||  •  •  •  ||  Mn  by  My  and  M^1  ||  •  •  •  ||  Mffn  by  My.  Once  again 
we  know  that  My  and  My  have  the  same  alphabet.  Let  this  alphabet  be  E.  Also  let 
7t  (ttQ,  a®, . . . ,  ,  6  (tto?  •  •  •  ?  ^k— t) i  and  ^ , . . . 

1.  Since  n  is  an  abstract  counterexample:  Ref  (oik)  =  E. 

2.  From  1  and  Definition  38:  IJ"=i  F ef(a\. )  =  Ref(ctk)  =  E. 

3.  Counterexample  is  valid:  for  1  <  i  <  n,  (9  J  i,  Ref(alk ))  G  Fail(Mj). 

4.  From  3  and  Theorem  26:  (6*,  (J”=1  Ref(a\))  G  Fai/(My). 

5.  From  2,  4  and  Definition  35:  My  has  a  deadlock. 


□ 


10.6  Abstraction  Refinement 


In  case  the  abstract  counterexample  n  returned  by  AbsDeadlock  is  found  to  be 
spurious,  we  wish  to  refine  our  abstraction  on  the  basis  of  n  and  re-attempt  the 
deadlock  check.  Recall,  from  Chapter  9,  Definition  29,  Definition  30,  the  definition  of 
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Succ  and  Condition  9.1.  For  deadlock,  abstraction  refinement  also  involves  computing 
proper  refinements  of  equivalence  relations  based  on  abstract  successors.  This  is 
achieved  by  the  algorithm  AbsRefine  presented  in  Procedure  10.2. 


Procedure  10.2  AbsRefine  for  doing  abstraction  refinement. 

Algorithm  AbsRefine(M,  R,  6,  F ) 

-  M  :  is  an  LKS,  9  :  is  a  trace  of  M,  F  :  is  a  set  of  actions  of  M 

-  R  :  is  an  equivalence  relation  over  the  states  of  M 
1:  let  9  =  (do,  •  •  • ,  cik- 1); 

2:  find  7 r  =  (ao,  ao, ,  dk-i,  o-k)  G  Path(MR )  such  that  F  =  Ref(ak ); 

//  7t  exists  because  of  condition  AR1 
3:  A  :=  Ooi 
4:  for  i  0  to  k  —  1 
5:  A"  :=  (Usex  Succ(s,  at))  fl  ai+ 

6:  if  A"  =  0  then  return  Split(M,  R,  a*,  {a*}); 

7:  return  Split(M,R,ak,Ref(ak )); 


More  precisely,  AbsRefine  takes  the  following  as  inputs:  (i)  an  LKS  M  = 
(S,  Init,  AP,  L,  E,  F),  (ii)  an  equivalence  relation  R  C  S  x  S,  (iii)  a  trace  d  G  £*, 
and  (iv)  a  set  of  actions  F  C  £.  In  addition,  the  inputs  to  AbsRefine  must 
obey  the  following  two  conditions:  (AR1)  (9,F)  G  AbsFail(MR )  and  (AR2) 
( 9,F )  ^  Fail(M).  AbsRefine  then  computes  and  returns  a  proper  refinement 
of  R. We  now  establish  the  correctness  of  AbsRefine.  We  consider  two  possible 
scenarios. 

1.  Suppose  AbsRefine  returns  from  line  6  when  the  value  of  i  is  /.  Since 
ai  cq+i  we  know  that  there  exists  s  G  cq  such  that  cq+i  G  Succ(s,ai). 
Let  A"'  denote  the  value  of  X  at  the  end  of  the  previous  iteration.  For  all 
s'  G  X',  cq+i  fL  Succ(s' ,ai).  Note  that  X'  ^  0  as  otherwise  AbsRefine  would 
have  terminated  with  i  =  l  —  1.  Therefore,  there  exists  s'  G  X'  such  that 
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cq+i  fL  Succ(s',ai).  Hence  the  call  to  Split  at  line  6  satisfies  Condition  9.1  and 
AbsRefine  returns  a  proper  refinement  of  R. 

2.  Suppose  AbsRefine  returns  from  line  7.  We  know  that  at  this  point  X  ^  0. 
Pick  an  arbitrary  s  G  X.  It  is  clear  that  there  exist  so,  ■  ■  ■ ,  s^-i  such  that 
(s0,  a0,  ■  .  . ,  Sfc-i)  Ofc-i)  s)  £  Path(M).  Hence  by  condition  AR2,  Ref(s)  F. 
Again  s  G  a*,,  and  from  the  way  n  has  been  chosen  at  line  2,  F  =  Re  f  (oik )■ 
Hence  by  Definition  38,  Ref(s)  C  F.  Pick  a  G  Em  such  that  a  G  F  and 
a  Ref(s).  Then  Succ(s,  a)  ^  0.  Again  since  a  G  Ref(ctk)  there  exists  s'  G  ck*, 
such  that  a  G  Ref  (s').  Hence  Succ(s',a )  =  0.  Hence  the  call  to  Split  at  line  8 
satisfies  Condition  9.1  and  once  again  AbsRefine  returns  a  proper  refinement 
of  R, 


10.7  Overall  Algorithm 

In  this  section  we  present  our  iterative  deadlock  detection  algorithm  and  establish 
its  correctness.  Let  Mi, . . . ,  Mn  be  arbitrary  LKSs  and  Mu  =  M\  ||  •  •  •  ||  Mn.  The 
algorithm  IterDeadlock  takes  M\ , . . . ,  Mn  as  inputs  and  reports  whether  M\\  has  a 
deadlock  or  not.  If  there  is  a  deadlock,  it  also  reports  a  trace  of  each  M%  that  would 
lead  to  the  deadlock  state.  Procedure  10.3  gives  the  pseudo-code  for  IterDeadlock. 
It  is  an  iterative  algorithm  and  uses  equivalence  relations  R\,...,Rn  such  that,  for 
1  <  i  <  n,  Ri  C  Sm%  x  Smi ■  Note  that  initially  each  Rt  is  set  to  the  trivial  equivalence 
relation  Sm%  x  Sm,l  ■ 

Theorem  31  The  algorithm  IterDeadlock  is  correct  arid  always  terminates. 
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Procedure  10.3  IterDeadlock  for  iterative  deadlock  detection. 

Algorithm  IterDeadlock^ilA, . . . ,  Mn)  / /  Mi, . . . ,  Mn  :  are  LKSs 
1:  for  i  :=  1  to  n,  R,  :=  SMi  X  SM% ; 

2:  forever  do 

/  /  abstract  and  verify 
3:  x  :=  AbsDeadlock^M-f1, . . . ,  Mffn)] 

4:  if  (x  =  “no  abstract  deadlock)  then  report  “no  deadlock ”  and  exit; 

5:  let  7r  =  (a0,  a0, . . . ,  a*,_ i,  a*,)  be  the  counterexample  reported  by  AbsDeadlock; 
6:  let  6  =  (a0, . . . ,  afc_i)  and  =  (a\, . . . ,  a£); 

/  /  validate  counterexample 

7:  find  i  e  {1,2,...,  n}  such  that  -iIsFailure(Mj,  6  \  i,  Ref(a\)); 

8:  if  no  such  i  then  report  “ deadlock'’’  and  the  ( 6  \  i)’s  as  counterexamples; 

/ /n  is  a  valid  counterexample,  hence  deadlock  exists  in  M\  ||  ■  •  •  ||  Mn 
9:  let  R  :=  AbsRefine(M*,  R,  6  J  i,  Ref(alk )); 

/ /  re/me  abstraction  and  repeat 


Proof.  First  we  argue  that  both  AR1  and  AR2  are  satisfied  every  time  AbsRefine 
is  invoked  on  line  9.  The  case  for  AR1  follows  from  Theorem  29  and  the  fact  that 
(oo,  ao,  ■  ■  ■ ,  Ofe-i,  otk)  €  Path{Mix  ||  •••  ||  Mffn).  The  case  for  AR2  is  trivial  from 
line  7  and  the  definition  of  IsFailure.  Next  we  show  that  if  IterDeadlock  terminates 
it  does  so  with  the  correct  answer.  There  are  two  possible  cases: 

1.  Suppose  IterDeadlock  exits  from  line  4.  Then  we  know  that  M f1  ||  ■  ■  ■  ||  Mffn 
does  not  have  an  abstract  deadlock.  Hence  by  Lemma  1,  Mi  ||  •  •  •  ||  Mn  does 
not  have  a  deadlock. 

2.  Otherwise,  suppose  IterDeadlock  exits  from  line  8.  Then  we  know  that  for 
1  <  i  <  n,  (6  \  i,  Ref(a\ ))  G  Fail{Mf).  Hence  by  Definition  41,  n  is  a  valid 
counterexample.  Therefore,  by  Lemma  2,  Mi  ||  •  •  •  ||  Mn  has  a  deadlock. 

Finally,  termination  follows  from  the  fact  that  the  AbsRefine  routine  invoked  on 
line  9  always  produces  a  proper  refinement  of  the  equivalence  relation  i?,.  Since  each 
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Mi  has  only  finitely  many  states,  this  process  cannot  proceed  indefinitely.  (In  fact,  the 
abstract  LKSs  converge  to  the  bisimulation  quotients  of  their  concrete  counterparts, 
since  AbsRefine  each  time  performs  a  unit  step  of  the  Paige-Tarjan  algorithm  [96]; 
however  in  practice  deadlock  freedom  is  often  established  or  disproved  well  before  the 
bisimulation  quotient  is  achieved). 


□ 


10.8  Experimental  Results 

We  implemented  our  technique  in  the  magic  tool,  magic  extracts  finite  LKS  models 
from  C  programs  using  predicate  abstraction.  These  LKSs  are  then  analyzed  for 
deadlock  using  the  approach  presented  in  this  chapter.  Once  a  real  counterexample 
7 r  is  found  at  the  level  of  the  LKSs  magic  analyzes  n  and,  if  necessary,  creates  more 
refined  models  by  inferring  new  predicates.  Our  actual  implementation  is  therefore 
a  two-level  CEGAR  scheme.  We  elide  details  of  the  outer  predicate  abstraction- 
refinement  loop  as  it  is  similar  to  our  previous  work  [23]. 

Table  10.1  summarizes  our  results.  The  ABB  benchmark  was  provided  to  us  by 
our  industrial  partner,  ABB  [1]  Corporation.  It  implements  part  of  an  interprocess 
communication  protocol  (IPC-1.6)  used  to  mediate  communication  in  a  multi¬ 
threaded  robotics  control  automation  system  developed  by  ABB.  The  implementation 
is  required  to  satisfy  various  safety-critical  properties,  in  particular,  deadlock  freedom. 
The  IPC  protocol  supports  multiple  modes  of  communication,  including  synchronous 
point-to-point,  broadcast,  publish/subscribe,  and  asynchronous  communication. 
Each  of  these  modes  is  implemented  in  terms  of  messages  passed  between  queues 
owned  by  different  threads.  The  protocol  handles  the  creation  and  manipulation  of 
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Name 

Plain 

IterDeadlock 

Sm 

Sr 

I 

T 

M 

Sm 

Sr 

I 

T 

M 

ABB 

2.1  x  109 

* 

* 

* 

162 

4.1  x  105 

1973 

861 

1446 

33.3 

SSL 

49405 

25731 

1 

44 

43.5 

16 

16 

16 

31.9 

40.8 

UCOSD-2 

1.1  x  10b 

5851 

5 

24 

14.5 

374 

261 

77 

14.5 

12.9 

UCOSD-3 

2.1  x  10Y 

* 

* 

* 

58.6 

6144 

4930 

120 

221.8 

15 

UCOSN-4 

1.9  x  10Y 

39262 

1 

18.1 

14.1 

8192 

2125 

30 

8.1 

10.5 

UCOSN-5 

9.4  x  10s 

4.2  x  105 

1 

253 

52.2 

65536 

12500 

37 

80 

12.7 

UCOSN-6 

4.7  x  101U 

* 

* 

* 

219.3 

5.2  x  105 

71875 

44 

813 

30.8 

RW-4 

1.3  x  10a 

8369 

4 

6.48 

10.8 

5120 

67 

54 

4.40 

10.0 

RW-5 

9.0  x  101U 

54369 

4 

35.1 

15.9 

24576 

132 

60 

7.33 

10.4 

RW-6 

5.8  x  1012 

3.5  x  105 

4 

257 

45.2 

1.1  x  105 

261 

66 

12.6 

10.8 

RW-7 

1.5  x  1014 

* 

* 

* 

178 

5.2  x  105 

518 

72 

25.3 

11.8 

RW-8 

* 

* 

* 

* 

* 

2.4  x  10b 

1031 

78 

60.5 

14.0 

RW-9 

* 

* 

* 

* 

* 

1.7  x  10Y 

2056 

84 

132 

14.5 

DPN-3 

3.6  x  10Y 

1401 

2 

.779 

- 

5832 

182 

27 

.849 

- 

DPN-4 

1.1  x  101U 

16277 

2 

11.8 

10.9 

1.0  x  105 

1274 

34 

7.86 

9.5 

DPN-5 

3.2  x  10i2 

1.9  x  10b 

2 
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28.0 

1.9  x  10b 

8918 

41 

84.6 

11.4 

DPN-6 

9.7  x  1014 

* 

* 

* 
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3.4  x  10Y 

62426 

48 

831 

26.1 

DPD-9 

3.5  x  1022 

11278 

1 

22.5 

12.0 

5.2  x  10y 

13069 

46 

191 

12.2 

DPD-10 

1.1  x  1025 

38268 

1 

87.6 

17.3 

6.2  x  101U 

44493 

51 

755 

18.4 

Table  10.1:  Experimental  results.  Sm  =  maximum  #  of  states;  Sr  =  #  of  reachable  states;  / 
=  #  of  iterations;  T  =  time  in  seconds;  M  =  memory  in  MB;  time  limit  =  1500  sec;  -  indicates 
negligible  value;  *  indicates  out  of  time;  notable  figures  are  highlighted. 

message  queues,  synchronizing  access  to  shared  data  using  various  operating  system 
primitives  (e.g.,  semaphores),  and  cleaning  up  internal  state  when  a  communication 
fails  or  times  out. 

In  particular,  we  analyzed  the  portion  of  the  IPC  protocol  that  implements  the 
primitives  for  synchronous  communication  (approximately  1500  LOG)  among  multiple 
threads.  With  this  type  of  communication,  a  sender  sends  a  message  to  a  receiver  and 
blocks  until  an  answer  is  received  or  it  times  out.  A  receiver  asks  for  its  next  message 
and  blocks  until  a  message  is  available  or  it  times  out.  Whenever  the  receiver  gets 
a  synchronous  message,  it  is  then  expected  to  send  a  response  to  the  sender,  magic 
successfully  verified  the  absence  of  deadlock  in  this  implementation. 

The  SSL  benchmark  represents  a  deadlock-free  system  (approx.  700  LOG) 
consisting  of  one  OpenSSL  server  and  one  OpenSSL  client.  The  UCOSD-n  benchmarks 
are  derived  from  /iC/OS-II,  and  consist  of  n  threads  executing  concurrently.  Access 
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to  shared  data  is  protected  via  locks.  This  implementation  suffers  from  deadlock. 
In  contrast,  the  UCOSN-n  benchmarks  are  deadlock-free.  The  RW-n  benchmarks 
implement  a  deadlock-free  reader-writer  system  (194  LOC)  with  n  readers,  n  writers, 
and  a  controller.  The  controller  ensures  that  at  most  one  writer  has  access  to 
the  critical  section.  Finally,  the  DPN-n  benchmarks  represent  a  deadlock-free 
implementation  of  n  dining  philosophers  (251  LOC),  while  DPD-n  implements  n 
dining  philosophers  (163  LOC)  that  can  deadlock.  As  Table  10.1  shows,  even  though 
the  implementations  are  of  moderate  size,  the  total  state  space  is  often  quite  large 
due  to  exponential  blowup. 

All  our  experiments  were  carried  out  on  an  AMD  Athlon  XP  1600+  machine  with 
1  GB  of  RAM.  Values  under  IterDeadlock  refer  to  measurements  for  our  approach 
while  those  under  Plain  correspond  to  a  naive  approach  involving  only  predicate 
abstraction  refinement.  We  note  that  IterDeadlock  outperforms  Plain  in  almost  all 
cases.  In  many  cases  IterDeadlock  is  able  to  establish  deadlock  or  deadlock  freedom 
while  Plain  runs  out  of  time.  Even  when  both  approaches  succeed,  IterDeadlock 
can  yield  over  20  times  speed-up  in  time  and  require  over  4  times  less  memory  (R,W- 
6).  For  the  experiments  involving  dining  philosophers  with  deadlock  however,  Plain 
performs  better  than  IterDeadlock.  This  is  because  in  these  cases  Plain  terminates 
as  soon  as  it  discovers  a  deadlocking  scenario,  without  having  to  explore  the  entire 
state-space.  In  contrast,  IterDeadlock  has  to  perform  many  iterations  before  Ending 
an  actual  deadlock. 
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Chapter  11 


Future  Directions 


The  domain  of  software  verification  abounds  with  intriguing  and  imposing  challenges. 
In  this  dissertation  I  have  presented  a  selected  few  of  these  problems  along  with  their 
possible  solutions.  It  is  now  time  to  step  back  and  look  at  the  bigger  picture.  In  this 
chapter,  I  will  attempt  to  point  some  significant  directions  that  I  was  unable  to  delve 
into  while  doing  my  thesis  research.  My  aim  is  to  answer  the  question:  “What  are 
the  next  two  or  three  Ph.D.  theses  to  be  written  in  this  area”  rather  than  “What 
would  I  do  if  I  had  another  six  months  to  work  on  this” . 

Security.  One  of  the  most  relevant  directions  is  the  application  of  formal  techniques 
to  detect  security  vulnerabilities  in  software.  An  important  question  to  ask  here  is: 
what  do  we  mean  by  software  security?  It  appears  to  me  that  at  some  level  all 
security  problems  are  violations  of  either  safety  (e.g.,  buffer  overflow)  or  liveness 
(e.g.,  denial  of  service)  requirements.  Yet  the  current  mechanisms  for  specifying 
safety  and  liveness  properties  are  inappropriate  with  respect  to  non-trivial  security 
claims.  Perhaps  state/event-based  reasoning  is  an  important  avenue  to  investigate. 


207 


Another  problem  is  the  scalability  of  formal  techniques  to  large  programs  running 
into  tens  of  thousands  or  even  millions  of  lines  of  code.  There  is  always  a  trade¬ 
off  between  scalability  and  precision.  It  is  possible  to  model  check  extremely  large 
programs  using  a  very  coarse  abstraction  such  as  the  control  flow  graph.  On  the  other 
hand,  coarser  abstractions  usually  lead  to  numerous  spurious  counterexamples.  It  is 
crucial  to  find  the  right  balance  between  precision  and  scalability  and  a  property- 
driven  approach  like  CEGAR  seems  to  be  quite  promising  in  this  regard. 

Certification.  The  notion  of  proof  or  certification  is  essential  for  the  wider 
applicability  of  model  checking  to  safety  critical  systems.  A  model  checker  is  usually 
an  enormous  untrusted  computing  base  and  hence  of  limited  credibility  if  its  results 
is  positive,  i.e. ,  if  it  says  that  a  specification  holds  on  a  system.  The  notion  of  proof 
carrying  code  (PCC)  [91]  aims  to  increase  our  confidence  in  any  system  analysis.  The 
idea  is  to  generate  a  proof  of  correctness  of  the  analysis  results  which  can  be  checked 
by  a  simple  trusted  proof  checker. 

The  problem  with  PCC  is  that  the  proofs  can  be  quite  large  even  for  simple 
properties.  This  is  similar  to  the  state-space  explosion  problem  in  model  checking. 
In  addition  the  original  formulation  of  PCC  was  restricted  to  the  certification  of 
properties  such  as  memory  safety.  While  considerable  progress  has  been  made  on 
limiting  proof  sizes,  it  is  vital  that  we  extend  PCC-like  technology  to  a  richer  class  of 
specifications. 

Learning.  Even  with  a  sophisticated  approach  like  compositional  CEGAR,  one 
is  left  at  the  mercy  of  state-space  explosion  during  the  verification  step  in  the 
CEGAR  loop.  It  is  therefore  imperative  that  our  verification  step  be  as  efficient 
as  possible.  One  of  the  most  promising  techniques  for  compositional  verification  is 
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assume-guarantee  (AG)  style  reasoning.  However  this  approach  usually  involves  the 
manual  construction  of  appropriate  assumptions  and  hence  is  inherently  difficult  to 
apply  to  large  systems. 

A  very  promising  development  [41]  is  the  use  of  learning  techniques  to 
automatically  construct  appropriate  assumptions  in  the  context  of  AG-style 
verification.  This  approach  has  yielded  encouraging  results  while  verifying  safety 
properties  on  relatively  simple  programs.  However,  its  effectiveness  on  a  wide  range 
non-trivial  benchmarks  is  yet  to  be  evaluated.  Moreover,  the  use  of  learning  in  the 
context  of  non-safety  specifications  such  as  liveness  and  simulation  is  an  area  yet  to 
be  explored. 

1  will  stop  with  the  above  three  important  directions  for  future  investigation.  This 
list  is  clearly  non-exhaustive  and  1  believe  that  as  far  as  software  analysis  is  concerned 
we  have  simply  scratched  the  surface.  The  gap  between  what  we  can  do  and  what 
we  would  like  to  achieve  is  quite  staggering,  and  bridging  this  gap  will  be  one  of  the 
foremost  problems  to  concern  us  in  the  foreseeable  future. 
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Appendix  A 


OpenSSL  Example 


In  this  chapter  we  describe  the  OpenSSL  benchmark  used  in  our  experimental 
evaluation  of  the  CEGAR  framework  for  simulation  conformance  (cf.  Section  6.5). 
Recall  that  the  benchmark  consists  of  two  components  -  one  for  the  server  and  the 
other  for  the  client.  We  will  Erst  present  the  server  and  then  the  client.  We  begin 
with  the  server  source  code.  Essentially,  the  source  consists  of  a  core  C  procedure 
ssl3_accept.  The  procedure  simulates  a  state  machine  via  a  top-level  while  loop 
and  a  variable  s->state  to  keep  track  of  the  current  state  of  the  machine.  The 
complete  source  code  for  ss!3_accept  is  presented  next. 


A.l  Server  Source 

int  ssl3_accept (SSL  *s  ) 

{ 

BUF_MEM  *buf  ; 
unsigned  long  1  ; 
unsigned  long  Time  ; 
unsigned  long  tmp  ; 
void  (*cb) ()  ; 
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long  numl  ; 
int  ret  ; 
int  new_state  ; 
int  state  ; 
int  skip  ; 

int  got_new_session  ; 

int  *tmp _ 0  ; 

int  tmp _ 1  ; 

int  tmp _ 2  ; 

int  tmp _ 3  ; 

int  tmp _ 4  ; 

int  tmp _ 5  ; 

int  tmp _ 6  ; 

int  tmp _ 7  ; 

long  tmp _ 8  ; 

int  tmp _ 9  ; 

int  tmp _ 10  ; 

tmp  =  (unsigned  long  )time( (time_t  *)((void  *)0)); 

Time  =  tmp; 

cb  =  (void  (*)()) ((void  *)0); 
ret  =  -1; 
skip  =  0; 

got_new_session  =  0; 

RAND_add( (void  const  *) (&  Time),  (int  ) sizeof (Time) ,  (double  )0); 
ERR_clear_error () ; 

tmp _ 0  =  _ errno_location() ; 

(*tmp _ 0)  =  0; 

if  ((unsigned  long  ) s->inf o_callback  !=  (unsigned  long  )((void  *)0))  { 
cb  =  s->inf o_callback; 

}  else  { 

if  ((unsigned  long  ) (s->ctx)->info_callback  !=  (unsigned  long  )((void  *)0))  { 
cb  =  (s->ctx)->inf o_callback; 

} 

} 

s->in_handshake  ++; 

tmp _ 1  =  SSL_state(s) ; 

if  (tmp _ 1  &  12288)  { 

tmp _ 2  =  SSL_state(s) ; 

if  (tmp _ 2  &  16384)  SSL_clear (s) ; 

}  else  SSL_clear (s) ; 

if  ((unsigned  long  )s->cert  ==  (unsigned  long  )((void  *)0))  { 
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ERR_put_error (20,  128,  179,  (char  const  *) "s3_srvr . c" ,  187); 
return  (-1) ; 

} 

while  (1)  { 

state  =  s->state; 
switch  (s->state)  { 
case  12292: 
s->new_session  =  1; 
case  16384:  ; 
case  8192:  ; 
case  24576:  ; 
case  8195: 
s->server  =  1; 

if  ((unsigned  long  )cb  !=  (unsigned  long  )((void  *)0))  { 
((*cb))(s,  16,  1); 

} 

if  (s->version  >>  8  !=  3)  { 

ERR_put_error (20 ,  128,  157,  (char  const  *) "s3_srvr . c" ,  211); 
return  (-1) ; 

} 

s->type  =  8192; 

if  ((unsigned  long  )s->init_buf  ==  (unsigned  long  )((void  *)0))  { 
buf  =  BUF_MEM_new() ; 

if  ((unsigned  long  )buf  ==  (unsigned  long  )((void  *)0))  { 
ret  =  -1; 
goto  end; 

} 

tmp _ 3  =  BUF_MEM_grow(buf ,  16384); 

if  ( !  tmp _ 3)  { 

ret  =  -1; 
goto  end; 

} 

s->init_buf  =  buf; 

} 

tmp _ 4  =  ssl3_setup_buff ers (s) ; 

if  ( !  tmp _ 4)  { 

ret  =  -1; 
goto  end; 

} 

s->init_num  =  0; 
if  (s->state  !=  12292)  { 

tmp _ 5  =  ssl_init_wbio_buffer(s,  1); 
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if  ( !  tmp _ 5)  { 

ret  =  -1; 
goto  end; 

} 

ssl3_init_f inished_mac (s) ; 
s->state  =  8464; 

(s->ctx)->stats . sess_accept  ++; 

}  else  { 

(s->ctx) ->stats . sess_accept_renegotiate  ++ 
s->state  =  8480; 

} 

break; 

case  8480:  ; 
case  8481: 
s->shutdown  =  0; 

ret  =  ssl3_send_hello_request (s) ; 
if  (ret  <=  0)  { 
goto  end; 

} 

(s->s3)->trap.next_state  =  8482; 

s->state  =  8448; 

s->init_num  =  0; 

ssl3_init_f inished_mac(s) ; 

break; 

case  8482: 

s->state  =  3; 

break; 

case  8464:  ; 
case  8465:  ; 
case  8466: 
s->shutdown  =  0; 
ret  =  ssl3_get_client_hello(s) ; 
if  (ret  <=  0)  { 
goto  end; 

} 

got_new_session  =  1; 
s->state  =  8496; 
s->init_nura  =  0; 
break; 

case  8496:  ; 
case  8497 : 

ret  =  ssl3_send_server_hello (s) ; 
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if  (ret  <=  0)  { 
goto  end; 

> 

if  (s->hit)  { 
s->state  =  8656; 

}  else  { 

s->state  =  8512; 

} 

s->init_num  =  0; 
break; 

case  8512:  ; 
case  8513:  ; 

if  (((s->s3)->tmp.new_cipher)->algorithms  &  256UL)  { 
skip  =  1 ; 

}  else  { 

ret  =  ssl3_send_server_certif icate (s) ; 
if  (ret  <=  0)  { 
goto  end; 

} 

} 

s->state  =  8528; 
s->init_num  =  0; 
break; 

case  8528:  ; 
case  8529: 

1  =  ((s->s3)->tmp.new_cipher)->algorithms; 
if  (s->options  &  2097152UL)  { 

(s->s3)->tmp.use_rsa_tmp  =  1; 

}  else  { 

(s->s3)->tmp.use_rsa_tmp  =  0; 

} 

if  ((s->s3)->tmp.use_rsa_tmp)  { 

goto  _L _ 0; 

}  else  { 

if  (1  &  30UL)  { 

goto  _L _ 0; 

}  else  { 

if  (1  &  1UL)  { 

if  ((unsigned  long  ) (s->cert)->pkeys [0] .privatekey 
(unsigned  long  )((void  *)0))  { 

goto  _L _ 0; 

}  else  { 
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if  (((s->s3)->tmp.new_cipher)->algo_strength  &  2UL)  { 

tmp _ 6  =  EVP_PKEY_size( (s->cert)->pkeys [0] .privatekey) ; 

if  (((s->s3)->tmp.new_cipher)->algo_strength  &  4UL)  { 

trap _ 7  =  512; 

}  else  { 

tmp _ 7  =  1024; 

} 

if  (tmp _ 6  *  8  >  tmp _ 7)  { 

_L _ 0 : 

_L : 

ret  =  ssl3_send_server_key_exchange (s) ; 
if  (ret  <=  0)  { 
goto  end; 

} 

}  else  { 
skip  =  1; 

} 

}  else  { 
skip  =  1 ; 

} 

} 

}  else  { 
skip  =  1; 

} 

} 

} 

s->state  =  8544; 
s->init_num  =  0; 
break; 

case  8544:  ; 
case  8545:  ; 

if  (s->verify_mode  &  1)  { 

if  ((unsigned  long  ) (s->session) ->peer  !=  (unsigned  long  )((void  *)0))  { 
if  (s->verify_mode  &  4)  { 
skip  =  1; 

(s->s3)->tmp . cert_request  =  0; 
s->state  =  8560; 

}  else  { 

goto  _L _ 2; 

} 

}  else  { 

_L _ 2 : 
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if  (((s->s3)->tmp.new_cipher)->algorithms  &  256UL)  { 
if  (s->verify_mode  &  2)  { 

goto  _L _ 1; 

}  else  { 
skip  =  1 ; 

(s->s3) ->tmp . cert_request  =  0; 
s->state  =  8560; 

} 

}  else  { 

_L _ 1 : 

(s->s3) ->tmp . cert_request  =  1; 
ret  =  ssl3_send_certif icate_request (s) ; 
if  (ret  <=  0)  { 
goto  end; 

> 

s->state  =  8448; 

(s->s3)->trap.next_state  =  8576; 
s->init_nura  =  0; 

} 

} 

}  else  { 
skip  =  1 ; 

(s->s3)->tmp. cert_request  =  0; 
s->state  =  8560; 

} 

break; 

case  8560:  ; 
case  8561: 

ret  =  ssl3_send_server_done (s) ; 
if  (ret  <=  0)  { 
goto  end; 

} 

(s->s3) ->tmp . next_state  =  8576; 

s->state  =  8448; 

s->init_num  =  0; 

break; 

case  8448: 

numl  =  BI0_ctrl (s->wbio ,  3,  0L,  (void  *)0); 
if  (numl  >  0L)  { 
s->rwstate  =  2; 

tmp _ 8  =  BI0_ctrl (s->wbio ,  11,  0L,  (void  *)0); 

numl  =  (long  )((int  )tmp _ 8); 
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if  (numl  <=  OL)  { 
ret  =  -1; 
goto  end; 

} 

s->rwstate  =  1; 

} 

s->state  =  (s->s3)->tmp.next_state; 
break; 

case  8576:  ; 
case  8577 : 

ret  =  ssl3_check_client_hello (s)  ; 
if  (ret  <=  0)  { 
goto  end; 

} 

if  (ret  ==  2)  { 
s->state  =  8466; 

}  else  { 

ret  =  ssl3_get_client_certif icate(s) ; 
if  (ret  <=  0)  { 
goto  end; 

} 

s->init_num  =  0; 
s->state  =  8592; 

} 

break; 

case  8592:  ; 
case  8593: 

ret  =  ssl3_get_client_key_exchange (s) ; 
if  (ret  <=  0)  { 
goto  end; 

} 

s->state  =  8608; 
s->init_num  =  0; 

( (* ( ( (s->method) ->ssl3_enc) ->cert_verif y_mac) ) ) 

(s,  &  (s->s3)->f inish_dgstl ,  &  (s->s3)->tmp. cert_verify_md [0] ) ; 

( (* ( ( (s->method) ->ssl3_enc) ->cert_verif y_mac) ) ) 

(s,  &  (s->s3)->f inish_dgst2,  &  (s->s3)->tmp. cert_verify_md [16] ) ; 
break; 

case  8608:  ; 
case  8609: 

ret  =  ssl3_get_cert_verify (s) ; 
if  (ret  <=  0)  { 
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goto  end; 

} 

s->state  =  8640; 
s->init_num  =  0; 
break; 

case  8640:  ; 
case  8641: 

ret  =  ssl3_get_f inished(s ,  8640,  8641); 
if  (ret  <=  0)  { 
goto  end; 

} 

if  (s->hit)  { 
s->state  =  3; 

}  else  { 

s->state  =  8656; 

} 

s->init_num  =  0; 
break; 

case  8656:  ; 
case  8657 : 

(s->session)->cipher  =  (s->s3)->tmp . new_cipher ; 

tmp _ 9  =  ( (* ( ( (s->method) ->ssl3_enc) ->setup_key_block) ) ) (s) ; 

if  ( !  tmp _ 9)  { 

ret  =  -1; 
goto  end; 

} 

ret  =  ssl3_send_change_cipher_spec (s ,  8656,  8657); 
if  (ret  <=  0)  { 
goto  end; 

} 

s->state  =  8672; 
s->init_num  =  0; 

tmp _ 10  =  ( (* ( ( (s->method)->ssl3_enc)->change_cipher_state) ) ) (s,  34) 

if  ( !  tmp _ 10)  { 

ret  =  -1; 
goto  end; 

} 

break; 

case  8672:  ; 
case  8673: 

ret  =  ssl3_send_f inished(s ,  8672,  8673, 

( (s->method) ->ssl3_enc) ->server_f inished_label , 
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( (s->method) ->ssl3_enc) ->server_f inished_label_len) ; 
if  (ret  <=  0)  { 
goto  end; 

} 

s->state  =  8448; 
if  (s->hit)  { 

(s->s3)->tmp.next_state  =  8640; 

}  else  { 

(s->s3)->tmp.next_state  =  3; 

} 

s->init_num  =  0; 

break; 

case  3: 

ssl3_cleanup_key_block(s) ; 

BUF_MEM_free(s->init_buf ) ; 
s->init_buf  =  (BUF_MEM  *)((void  *)0); 
ssl_free_wbio_buffer (s) ; 
s->init_nura  =  0; 
if  (got_new_session)  { 
s->new_session  =  0; 
ssl_update_cache (s ,  2); 

(s->ctx)->stats . sess_accept_good  ++; 

s->handshake_func  =  (int  (*)())(&  ssl3_accept) ; 

if  ((unsigned  long  )cb  !=  (unsigned  long  )((void  *)0))  { 

( (*cb) ) (s ,  32,  1); 

} 

}  ret  =  1; 
goto  end; 
default : 

ERR_put_error(20,  128,  255,  (char  const  *) "s3_srvr . c" ,  536); 
ret  =  -1; 
goto  end; 

} 

if  (!  (s->s3)->tmp.reuse_message)  { 
if  ( !  skip)  { 
if  (s->debug)  { 

ret  =  (int  )BI0_ctrl (s->wbio ,  11,  0L,  (void  *)0); 
if  (ret  <=  0)  { 
goto  end; 

} 

} 

if  ((unsigned  long  )cb  !=  (unsigned  long  )((void  *)0))  { 
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if  (s->state  !=  state)  { 
new_state  =  s->state; 
s->state  =  state; 

((*cb))(s,  8193,  1); 
s->state  =  new_state; 

} 

} 

} 

} 

skip  =  0; 

} 

end: 

s->in_handshake  — ; 

if  ((unsigned  long  )cb  !=  (unsigned  long  )((void  *)0))  { 
((*cb))(s,  8194,  ret); 

} 

return  (ret) ; 


A. 2  Server  Library  Specifications 

As  discussed  earlier,  the  essential  idea  behind  magic  is  to  model  each  library  routine 
call  by  an  EFSM.  For  example,  the  call  to  ssl3_send_hello_request  is  modeled 
by  an  EFSM  SendHelloRequest  which  either  does  the  action  send_hello_request 
and  then  returns  the  value  1  or  simply  returns  —1.  In  MAGIC  one  can  specify  this 
information  via  the  following  syntax. 

cproc  ssl3_send_hello_request  { 

abstract  {ssl3_send_hello_request_abs , 1 , SendHelloRequest} ; 

> 

SendHelloRequest  = 

( 

send_hello_request  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 


233 


The  complete  input  to  MAGIC  for  all  the  library  routines  invoked  by  ssl3_accept 

and  their  corresponding  EFSMs  is  as  follows: 

cproc  ssl3_send_hello_request  { 

abstract  {ssl3_send_hello_request_abs , 1 , SendHelloRequest} ; 

} 

cproc  ssl3_get_client_hello  { 

abstract  {ssl3_get_client_hello_abs , 1 , GetClientHello} ; 

} 

cproc  ssl3_send_server_hello  { 

abstract  {ssl3_send_server_hello_abs , 1 , SendServerHello} ; 

} 

cproc  ssl3_send_server_certif icate  { 

abstract  {ssl3_send_server_certif icate_abs , 1 , SendServerCertif icate} ; 

> 

cproc  ssl3_send_server_key_exchange  { 

abstract  {ssl3_send_server_key_exchange_abs , 1 , SendServerKeyExchange} ; 

} 

cproc  ssl3_send_certif icate_request  { 

abstract  {ssl3_send_certif icate_request_abs , 1 , SendCertif icateRequest} ; 

> 

cproc  ssl3_send_server_done  { 

abstract  {ssl3_send_server_done_abs , 1 , SendServerDone} ; 

> 

cproc  ssl3_check_client_hello  { 

abstract  {ssl3_check_client_hello_abs , 1 , CheckClientHello} ; 

> 

cproc  ssl3_get_client_certif icate  { 

abstract  {ssl3_get_client_certif icate_abs , 1 ,GetClientCertif icate} ; 

} 

cproc  ssl3_get_client_key_exchange  { 

abstract  {ssl3_get_client_key_exchange_abs , 1 , GetClientKeyExchange} ; 

} 
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cproc  ssl3_get_cert_verify  { 

abstract  {ssl3_get_cert_verif y_abs , 1 , GetCertVerif y} ; 

> 

cproc  ssl3_get_f inished  { 

abstract  {ssl3_get_f inished_abs , 1 ,GetFinished}; 

} 

cproc  ssl3_send_change_cipher_spec  { 

abstract  {ssl3_send_change_cipher_spec_abs , 1 , SendChangeCipherSpec} ; 

> 

cproc  ssl3_send_f inished  { 

abstract  {ssl3_send_f inished_abs ,  1 ,SendFinished}; 

} 

SendHelloRequest  = 

( 

send_hello_request  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 

GetClientHello  = 

( 

exch_client_hello  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 

SendServerHello  = 

( 

exch_server  Jiello  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 

SendServerCertif icate  = 

( 

exch_server_certif icate  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 

SendChangeCipherSpec  = 
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( 

send_change_cipher_spec  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 

GetClientKeyExchange  = 

( 

key_exchange_clnt_to_srvr  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 

SendServerKeyExchange  = 

( 

key_exchange_srvr_to_clnt  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 

SendFinished  = 

( 

f inished_srvr_to_clnt  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 

SendCertif icateRequest  = 

( 

certif icate_request_srvr_to_clnt  ->  return  {$0  ==  1}  ->  STOP 
return  {$0  ==  -1}  ->  STOP 

). 

CheckClientHello  = 

( 

check_client_hello  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 

GetClientCertif icate  = 

( 

exch_client_certif icate  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 

GetFinished  = 
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( 


f inished_clnt_to_srvr  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 


GetCertVerify  = 

( 

cert_verify_clnt_to_srvr  ->  return  {$0  ==  1]-  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 


SendServerDone  = 

( 

send_server_done  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 


A. 3  Client  Source 


We  have  presented  the  input  to  magic  as  far  as  the  server  is  concerned.  We  now 
present  the  client  component  beginning  with  the  source  code.  The  client  consists  of  a 
core  procedure  called  ssl3_connect  whose  structure  is  very  similar  to  ssl3_accept. 
In  particular,  it  also  simulates  a  state  machine  via  a  top-level  while  loop  and  a 
variable  s->state  to  keep  track  of  the  current  state  of  the  machine.  The  complete 
source  code  for  ssl3_connect  is  presented  next. 

int  ssl3_connect (SSL  *s  ) 

{ 

BUF_MEM  *buf  ; 
unsigned  long  Time  ; 
unsigned  long  tmp  ; 
unsigned  long  1  ; 
long  numl  ; 
void  (*cb) ()  ; 
int  ret  ; 
int  new_state  ; 
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int  state  ; 
int  skip  ; 

int  *tmp _ 0  ; 

int  tmp _ 1  ; 

int  tmp _ 2  ; 

int  tmp _ 3  ; 

int  tmp _ 4  ; 

int  tmp _ 5  ; 

int  tmp _ 6  ; 

int  tmp _ 7  ; 

int  tmp _ 8  ; 

long  tmp _ 9  ; 

tmp  =  (unsigned  long  )time( (time_t  *)((void  *)0)); 

Time  =  tmp; 

cb  =  (void  (*)()) ((void  *)0); 
ret  =  -1; 
skip  =  0; 

RAND_add( (void  const  *) (&  Time),  (int  ) sizeof (Time) ,  (double  )0); 
ERR_clear_error () ; 

tmp _ 0  =  _ errno_location() ; 

(*tmp _ 0)  =  0; 

if  ((unsigned  long  ) s->inf o_callback  !=  (unsigned  long  )((void  *)0))  { 
cb  =  s->inf o_callback; 

}  else  { 

if  ((unsigned  long  ) (s->ctx)->info_callback  != 

(unsigned  long  )((void  *)0))  { 
cb  =  (s->ctx)->inf o_callback; 

} 

} 

s->in_handshake  ++; 

tmp _ 1  =  SSL_state(s) ; 

if  (tmp _ 1  &  12288)  { 

tmp _ 2  =  SSL_state(s) ; 

if  (tmp _ 2  &  16384)  { 

SSL_clear (s) ; 

} 

}  else  { 

SSL_clear(s) ; 

} 

while  (1)  { 

state  =  s->state; 
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switch  (s->state)  { 
case  12292: 
s->new_session  =  1; 
s->state  =  4096; 

(s->ctx)->stats . sess_connect_renegotiate  ++; 

case  16384:  ; 

case  4096:  ; 

case  20480:  ; 

case  4099: 

s->server  =  0; 

if  ((unsigned  long  )cb  !=  (unsigned  long  )((void  *)0))  { 

( (*cb) ) (s ,  16,  1); 

} 

if  ((s->version  &  65280)  !=  768)  { 

ERR_put_error (20 ,  132,  157,  (char  const  *) "s3_clnt . c" ,  146); 
ret  =  -1; 
goto  end; 

} 

s->type  =  4096; 

if  ((unsigned  long  )s->init_buf  ==  (unsigned  long  )((void  *)0))  { 
buf  =  BUF_MEM_new() ; 

if  ((unsigned  long  )buf  ==  (unsigned  long  )((void  *)0))  { 
ret  =  -1; 
goto  end; 

} 

trap _ 3  =  BUF_MEM_grow(buf ,  16384); 

if  ( !  tmp _ 3)  { 

ret  =  -1; 
goto  end; 

} 

s->init_buf  =  buf; 

} 

tmp _ 4  =  ssl3_setup_buff ers (s) ; 

if  ( !  tmp _ 4)  { 

ret  =  -1; 
goto  end; 

} 

tmp _ 5  =  ssl_init_wbio_buf f er (s ,  0); 

if  ( !  tmp _ 5)  { 

ret  =  -1; 
goto  end; 

} 
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ssl3_init_f inished_mac(s) ; 
s->state  =  4368; 

(s->ctx)->stats . sess_connect  ++; 

s->init_num  =  0; 

break; 

case  4368:  ; 
case  4369: 
s->shutdown  =  0; 
ret  =  ssl3_client_hello (s) ; 
if  (ret  <=  0)  { 
goto  end; 

} 

s->state  =  4384; 
s->init_num  =  0; 

if  ((unsigned  long  )s->bbio  !=  (unsigned  long  )s->wbio)  { 
s->wbio  =  B10_push(s->bbio,  s->wbio) ; 

} 

break; 

case  4384:  ; 
case  4385: 

ret  =  ssl3_get_server_hello(s) ; 
if  (ret  <=  0)  { 
goto  end; 

} 

if  (s->hit)  { 
s->state  =  4560; 

}  else  { 

s->state  =  4400; 

} 

s->init_nura  =  0; 
break; 

case  4400:  ; 
case  4401:  ; 

if  (((s->s3)->tmp.new_cipher)->algorithms  &  256UL)  { 
skip  =  1; 

}  else  { 

ret  =  ssl3_get_server_certif icate(s) ; 
if  (ret  <=  0)  { 
goto  end; 

} 

} 

s->state  =  4416; 
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s->init_num  =  0; 
break; 

case  4416:  ; 
case  4417 : 

ret  =  ssl3_get_key_exchange (s) ; 
if  (ret  <=  0)  { 
goto  end; 

} 

s->state  =  4432; 
s->init_num  =  0; 

tmp _ 6  =  ssl3_check_cert_and_algorithm(s) ; 

if  ( !  tmp _ 6)  { 

ret  =  -1; 
goto  end; 

} 

break; 

case  4432:  ; 
case  4433: 

ret  =  ssl3_get_certif icate_request (s) ; 
if  (ret  <=  0)  { 
goto  end; 

> 

s->state  =  4448; 
s->init_num  =  0; 
break; 

case  4448:  ; 
case  4449 : 

ret  =  ssl3_get_server_done (s) ; 
if  (ret  <=  0)  { 
goto  end; 

} 

if  ( (s->s3)->tmp . cert_req)  { 
s->state  =  4464; 

}  else  { 

s->state  =  4480; 

} 

s->init_num  =  0; 
break; 

case  4464:  ; 
case  4465:  ; 
case  4466:  ; 
case  4467 : 


241 


ret  =  ssl3_send_client_certif icate (s)  ; 
if  (ret  <=  0)  { 
goto  end; 

} 

s->state  =  4480; 
s->init_nura  =  0; 
break; 

case  4480:  ; 
case  4481: 

ret  =  ssl3_send_client_key_exchange(s) ; 
if  (ret  <=  0)  { 
goto  end; 

> 

1  =  ((s->s3)->tmp.new_cipher)->algorithms; 
if  ((s->s3)->tmp. cert_req  ==  1)  { 
s->state  =  4496; 

}  else  { 

s->state  =  4512; 

(s->s3) ->change_cipher_spec  =  0; 

} 

s->init_nura  =  0; 
break; 

case  4496:  ; 
case  4497 : 

ret  =  ssl3_send_client_verify (s) ; 
if  (ret  <=  0)  { 
goto  end; 

} 

s->state  =  4512; 
s->init_nura  =  0; 

(s->s3) ->change_cipher_spec  =  0; 
break; 

case  4512:  ; 
case  4513: 

ret  =  ssl3_send_change_cipher_spec (s ,  4512,  4513) 
if  (ret  <=  0)  { 
goto  end; 

} 

s->state  =  4528; 
s->init_nura  =  0; 

(s->session) ->cipher  =  (s->s3)->tmp ,new_cipher ; 
if  ((unsigned  long  ) (s->s3)->tmp.new_compression 
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(unsigned  long  )((void  *)0))  { 

(s->session)->compress_meth  =  0; 

}  else  { 

(s->session) ->corapress_meth  =  ( (s->s3) ->tmp . new_compression) ->id ; 

> 

tmp _ 7  =  ((=•=((  (s->method) ->ssl3_enc)  ->setup_key_block) ) )  (s)  ; 

if  (!  tmp _ 7)  { 

ret  =  -1; 
goto  end; 

} 

tmp _ 8  =  ((*(((s->method)->ssl3_enc)->change_cipher_state))) (s,  18); 

if  ( !  tmp _ 8)  { 

ret  =  -1; 
goto  end; 

} 

break; 

case  4528:  ; 
case  4529: 

ret  =  ssl3_send_f inished(s ,  4528,  4529, 

( (s->method) ->ssl3_enc) ->client_f inished_label , 

( (s->method)->ssl3_enc)->client_f inished_label_len) ; 
if  (ret  <=  0)  { 
goto  end; 

> 

s->state  =  4352; 

(s->s3)->f lags  &=  -5L; 
if  (s->hit)  { 

(s->s3)->tmp ,next_state  =  3; 
if  ( (s->s3)->f lags  &  2L)  { 
s->state  =  3; 

(s->s3)->f lags  |=  4L; 

(s->s3)->delay_buf_pop_ret  =  0; 

} 

}  else  { 

(s->s3)->tmp ,next_state  =  4560; 

} 

s->init_num  =  0; 
break; 

case  4560:  ; 
case  4561: 

ret  =  ssl3_get_f inished(s ,  4560,  4561); 
if  (ret  <=  0)  { 


243 


goto  end; 

} 

if  (s->hit)  { 
s->state  =  4512; 

}  else  { 

s->state  =  3; 

} 

s->init_nura  =  0; 

break; 

case  4352: 

numl  =  B10_ctrl(s->wbio,  3,  0L,  (void  *)0); 
if  (numl  >  0L)  { 
s->rwstate  =  2; 

tmp _ 9  =  B10_ctrl(s->wbio,  11,  0L,  (void  *)0); 

numl  =  (long  )((int  )tmp _ 9); 

if  (numl  <=  0L)  { 
ret  =  -1; 
goto  end; 

} 

s->rwstate  =  1; 

} 

s->state  =  (s->s3)->tmp.next_state; 

break; 

case  3: 

ssl3_cleanup_key_block(s) ; 

if  ((unsigned  long  )s->init_buf  !=  (unsigned  long  )((void  *)0))  { 
BUF_MEM_free (s->init_buf ) ; 
s->init_buf  =  (BUF.MEM  *)((void  *)0); 

} 

if  (!  ( (s->s3)->f lags  &  4L))  { 
ssl_f ree_wbio_buf f er (s) ; 

} 

s->init_num  =  0; 
s->new_session  =  0; 
ssl_update_cache (s ,  1); 
if  (s->hit)  { 

(s->ctx)->stats . sess_hit  ++; 

} 

ret  =  1; 

s->handshake_func  =  (int  (*)())(&  ssl3_connect) ; 

(s->ctx)->stats . sess_connect_good  ++; 

if  ((unsigned  long  )cb  !=  (unsigned  long  )((void  *)0))  { 
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(Ocb))(s,  32,  1)  ; 


> 

goto  end; 

default:  ERR_put_error (20 ,  132,  255,  (char  const  *)"s3_clnt 
ret  =  -1; 
goto  end; 

} 

if  (!  (s->s3)->tmp.reuse_message)  { 
if  ( !  skip)  { 
if  (s->debug)  { 

ret  =  (int  )BI0_ctrl (s->wbio ,  11,  0L,  (void  *)0); 
if  (ret  <=  0)  { 
goto  end; 

> 

} 

if  ((unsigned  long  )cb  !=  (unsigned  long  )((void  *)0)) 
if  (s->state  !=  state)  { 
new_state  =  s->state; 
s->state  =  state; 

((*cb))(s,  4097,  1); 
s->state  =  new_state; 

} 

} 

} 

} 

skip  =  0; 


end: 

s->in_handshake  — ; 

if  ((unsigned  long  )cb  !=  (unsigned  long  )((void  *)0))  { 
((*cb))(s,  4098,  ret); 

} 

return  (ret) ; 


A. 4  Client  Library  Specifications 


The  complete  input  to  MAGIC  for  all  the  library  routines  invoked  by  ssl3 
and  their  corresponding  EFSMs  is  as  follows: 


.c",  418) 


{ 


connect 
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cproc  ssl3_client_hello  { 

abstract  {ssl3_client_hello_abs , 1 , Ssl3ClientHello} ; 

} 

cproc  ssl3_get_server_hello  { 

abstract  {ssl3_get_server_hello_abs , 1 , Ssl3GetServerHello} ; 

} 

cproc  ssl3_get_f inished  { 

abstract  {ssl3_get_f inished_abs , 1 ,Ssl3GetFinished} ; 

} 

cproc  ssl3_get_server_certif icate  { 

abstract  {ssl3_get_server_certif icate_abs , 1 ,Ssl3GetServerCertif icate} ; 

> 

cproc  ssl3_send_change_cipher_spec  { 

abstract  {ssl3_send_change_cipher_spec_abs , 1 , Ssl3SendChangeCipherSpec} ; 

> 

cproc  ssl3_get_key_exchange  { 

abstract  {ssl3_get_key_exchange_abs , 1 , Ssl3GetKeyExchange} ; 

} 

cproc  ssl3_send_f inished  { 

abstract  {ssl3_send_f inished_abs , 1 , Ssl3SendFinished} ; 

> 

cproc  ssl3_get_certif icate_request  { 

abstract  {ssl3_get_certif icate_request_abs , 1 , Ssl3GetCertif icateRequest} ; 

> 

cproc  ssl3_get_server_done  { 

abstract  {ssl3_get_server_done_abs , 1 , Ssl3GetServerDone} ; 

> 

cproc  ssl3_send_client_certif icate  { 

abstract  {ssl3_send_client_certif icate_abs , 1 , Ssl3SendClientCertif icate} ; 

} 

cproc  ssl3_send_client_key_exchange  { 

abstract  {ss!3_send_client_key_exchange_abs , 1 , Ssl3SendClientKeyExchange} ; 
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} 

cproc  ssl3_send_client_verify  { 

abstract  {ssl3_send_client_verif y_abs , 1 , Ssl3SendClientVerif y} ; 

} 

Ssl3ClientHello  = 

( 

exch_client_hello  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 

Ssl3GetServerHello  = 

( 

exch_server  Jiello  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 

Ssl3GetFinished  = 

( 

f inished_srvr_to_clnt  ->  return  {$0  ==  1}  ->  STOP  I 
return  {$0  ==  -1}  ->  STOP 

). 

Ssl3GetServerCertif icate  = 

( 

exch_server_certif icate  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 

Ssl3SendChangeCipherSpec  = 

( 

send_change_cipher_spec  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 

Ssl3GetKeyExchange  = 

( 

key_exchange_srvr_to_clnt  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 
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Ssl3SendFinished  = 

( 

f inished_clnt_to_srvr  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 

Ssl3GetCertif icateRequest  = 

( 

certif icate_request_srvr_to_clnt  ->  return  {$0  ==  1}  ->  STOP 
return  {$0  ==  -1}  ->  STOP 

). 

Ssl3GetServerDone  = 

( 

ssl3_get_server_done  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 

Ssl3SendClientCertif icate  = 

( 

exch_client_certif icate  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 

Ssl3SendClientKeyExchange  = 

( 

key_exchange_clnt_to_srvr  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 

Ssl3SendClientVerify  = 

( 

cert_verify_clnt_to_srvr  ->  return  {$0  ==  1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

). 
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A. 5  Complete  OpenSSL  Specification 


We  have  presented  the  input  to  MAGIC  for  both  the  OpenSSL  client  and  server 
components.  We  now  present  the  input  for  providing  a  specification  against  which 
MAGIC  will  check  simulation  conformance.  Note  that  the  actions  appearing  in  the 
specification  must  be  the  same  as  those  in  the  alphabet  of  the  EFSMs  presented 
earlier.  Also  note  the  specification  is  non-deterministic.  This  causes  it  to  blowup 
during  determinization  which  contributes  to  the  improved  performance  of  simulation 
when  compared  to  trace  containment  (cf.  Section  6.5). 

cprog  ssl3  =  ssl3_accept , ssl3_connect  { 
abstract  ssl3, 

{$l->state  ==  (0xll0|0x2000), 

$l->state  ==  (0x041 (0x1000 | 0x2000) )},Ssl3; 

} 

Ssl3  = 

( 

epsilon  ->  SrClntHelloA  | 
epsilon  ->  SwHelloReqA 

), 

SrClntHelloA  = 

( 

exch_client_hello  ->  SwSrvrHelloA  | 
return  {$0  ==  -1}  ->  STOP 

), 

SwHelloReqA  = 

( 

send_hello_request  ->  SwFlushSwHelloReqC  | 
return  {$0  ==  -1}  ->  STOP 

), 

SwFlushSwHelloReqC  = 

( 

epsilon  ->  SwHelloReqC  | 
return  {$0  ==  -1}  ->  STOP 
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), 

SwHelloReqC  = 

( 

epsilon  ->  Ok  | 

return  {$0  ==  -1}  ->  STOP 

), 

Ok  =  (  return  {$0  ==  1}  ->  STOP  ) , 

SwSrvrHelloA  = 

( 

exch_server_hello  ->  SwSrvrHelloAl  | 
return  {$0  ==  -1}  ->  STOP 

), 

SwSrvrHelloAl  = 

( 

epsilon  ->  SwChangeA  | 
epsilon  ->  SwCertA  | 
return  {$0  ==  -1}  ->  STOP 

), 

SwChangeA  = 

( 

send_change_cipher_spec  ->  SwChangeAl  | 
return  {$0  ==  -1}  ->  STOP 

), 

SwChangeAl  = 

( 

epsilon  ->  SwFinishedA  | 
return  {$0  ==  -1}  ->  STOP 

), 

SwCertA  = 

( 

exch_server_certif icate  ->  SwCertAl  | 
epsilon  ->  SwCertAl  | 
return  {$0  ==  -1}  ->  STOP 

), 
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SwCertAl  = 

( 

epsilon  ->  SwKeyExchA  | 
return  {$0  ==  -1}  ->  STOP 

), 

SwKeyExchA  = 

( 

key_exchange_srvr_to_clnt  ->  SwKeyExchAl  | 
epsilon  ->  SwKeyExchAl  | 
return  {$0  ==  -1}  ->  STOP 

), 

SwKeyExchAl  = 

( 

epsilon  ->  SwCertReqA  | 
return  {$0  ==  -1}  ->  STOP 

), 

SwFinishedA  = 

( 

f inished_srvr_to_clnt  ->  SwFinishedAl  | 
return  {$0  ==  -1}  ->  STOP 

), 

SwFinishedAl  = 

( 

epsilon  ->  SwFlushSrFinishedA  | 
epsilon  ->  SwFlushOk  | 
return  {$0  ==  -1}  ->  STOP 

), 

SwFlushSrFinishedA  = 

( 

epsilon  ->  SrFinishedA  | 
return  {$0  ==  -1}  ->  STOP 

), 

SwFlushOk  = 

( 

epsilon  ->  Ok  | 

return  {$0  ==  -1}  ->  STOP 
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), 

SwCertReqA  = 

( 

epsilon  ->  SwSrvrDoneA  | 

certif icate_request_srvr_to_clnt  ->  SwFlushSrCertA  | 
return  {$0  ==  -1}  ->  STOP 

), 

SwFlushSrCertA  = 

( 

epsilon  ->  SrCertA  | 
return  {$0  ==  -1}  ->  STOP 

), 

SrCertA  = 

( 

check_client_hello  ->  SrCertAl  | 
return  {$0  ==  -1}  ->  STOP 

), 

SrCertAl  = 

( 

epsilon  ->  SrClntHelloC  | 
exch_client_certif icate  ->  SrKeyExchA  | 
exch_client_certif icate  ->  return  {$0  ==  -1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

), 

SrFinishedA  = 

( 

f inished_clnt_to_srvr  ->  SrFinishedAl  | 
return  {$0  ==  -1}  ->  STOP 

), 

SrFinishedAl  = 

( 

epsilon  ->  0k  | 
epsilon  ->  SwChangeA  | 
return  {$0  ==  -1}  ->  STOP 

), 
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SrClntHelloC  =  (  epsilon  ->  SrClntHelloA  ) , 


SrKeyExchA  = 

( 

key_exchange_clnt_to_srvr  ->  SrCertVrfyA  | 
key_exchange_clnt_to_srvr  ->  return  {$0  ==  -1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

), 

SrCertVrfyA  = 

( 

cert_verify_clnt_to_srvr  ->  SrFinishedA  | 
cert_verify_clnt_to_srvr  ->  return  {$0  ==  -1}  ->  STOP  | 
return  {$0  ==  -1}  ->  STOP 

), 

SwSrvrDoneA  = 

( 

send_server_done  ->  SwFlushSrCertA  | 
return  {$0  ==  -1}  ->  STOP 

). 
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